Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Support-bundle should not set error on optional phase #6898

Closed
w13915984028 opened this issue Oct 28, 2024 · 4 comments
Closed

[BUG] Support-bundle should not set error on optional phase #6898

w13915984028 opened this issue Oct 28, 2024 · 4 comments
Assignees
Labels
area/support-bundle-kit kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/always Reproducible 100% of the time reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one severity/3 Function working but has a major issue w/ workaround
Milestone

Comments

@w13915984028
Copy link
Member

w13915984028 commented Oct 28, 2024

Describe the bug

Support-bundle can't work unless disable rancher-monitoring addon

#6891 (comment)

image

It caused the rancher-monitoring related POD logs can't be collected.

To Reproduce
Steps to reproduce the behavior:

  1. It happend per issue [BUG] Pod prometheus-rancher-monitoring can not be started after v1.3.2 => v1.4.0-rc4 upgrade  #6891 when prometheus can't work properly

Expected behavior

SB should work as reliable as possible

https://github.com/rancher/support-bundle-kit/blob/6bd0407236d20de97b38c84a31944fe2d36d9949/pkg/manager/manager.go#L285

Prometheus is broken, above code will return error

...
	if managerStatus.Error {
		return h.setError(sb, managerStatus.ErrorMessage) // will cause SB to be aborted
	}

https://github.com/rancher/support-bundle-kit/blob/6bd0407236d20de97b38c84a31944fe2d36d9949/pkg/manager/manager.go#L189

func (m *SupportBundleManager) runPhase(phase RunPhase, progressCount *int, maxProgressCount int) error {
	logrus.Infof("Running phase %s", phase.Name)
	m.status.SetPhase(phase.Name)
	if err := phase.Run(); err != nil {
		m.status.SetError(err.Error())
		logrus.Errorf("Failed to run phase %s: %s", phase.Name, err.Error())
		return err
	}
...

func (m *SupportBundleManager) runAllPhases(requiredPhases []RunPhase, optionalPhases []RunPhase, postPhases []RunPhase) {

...

	for _, phase := range optionalPhases {
		if err := m.runPhase(phase, &progressCount, maxProgressCount); err != nil {
			logrus.Errorf("Failed to run optionalPhases %s: %s", phase.Name, err.Error())
			// Since it's optional, don't return error.
			continue
		}
	}

Support bundle

Environment

  • Harvester ISO version: Harvester v1.4.0-rc4
  • Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630):

Additional context

@w13915984028 w13915984028 added kind/bug Issues that are defects reported by users or that we know have reached a real release severity/3 Function working but has a major issue w/ workaround area/support-bundle-kit reproduce/always Reproducible 100% of the time reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one labels Oct 28, 2024
@w13915984028 w13915984028 added this to the v1.5.0 milestone Oct 28, 2024
@Yu-Jack Yu-Jack self-assigned this Oct 28, 2024
@Yu-Jack
Copy link
Contributor

Yu-Jack commented Oct 28, 2024

I also notice that there is a bug for progressCount. If the optional phase gets error, the progressCount doesn't increase. But, it should increase since it's optional. I'll fix this as well.

@harvesterhci-io-github-bot
Copy link

harvesterhci-io-github-bot commented Oct 29, 2024

Pre Ready-For-Testing Checklist

  • If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted?
    The HEP PR is at:

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at: Please use v1.4-head to test this feature. Follow steps on this comment to create a fake Prometheus Pod to test.

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)?
    The PR is at:
    bump: support bundle kit to v0.0.46 #6905
    fix: shouldn't set error in optional phases rancher/support-bundle-kit#126

    • Does the PR include the explanation for the fix or the feature?

    • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
      The PR for the YAML change is at:
      The PR for the chart change is at:

  • If labeled: area/ui Has the UI issue filed or ready to be merged?
    The UI issue/PR is at:

  • If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged?
    The documentation/KB PR is at:

  • If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

    • The automation skeleton PR is at:
    • The automation test case PR is at:
  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at:

@harvesterhci-io-github-bot

Automation e2e test issue: harvester/tests#1632

@brandboat
Copy link
Member

Test OK on v1.4-head, will close the issue

Environment

  • single node (12C/16G/500GB)
  • v1.4-head (c6facee)

Test Plan

  • Follow bump: support bundle kit to v0.0.46 #6905, and I can generate support bundle by either enable/disable rancher-monitoring, and when using fake prometheus, I can see logs as follows (support bundle still successfully created)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/support-bundle-kit kind/bug Issues that are defects reported by users or that we know have reached a real release reproduce/always Reproducible 100% of the time reproduce/needed Reminder to add a reproduce label and to remove this one severity/needed Reminder to add a severity label and to remove this one severity/3 Function working but has a major issue w/ workaround
Projects
None yet
Development

No branches or pull requests

4 participants