Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safe_reload and interface check for process monitoring and container check tests #16252

Merged
merged 2 commits into from
Dec 28, 2024

Conversation

yejianquan
Copy link
Collaborator

Description of PR

Summary:
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console

How did you do it?

Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

How did you verify/test it?

PR test will verify it.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan
Copy link
Collaborator Author

FYI @bingwang-ms , the failure has a high chance to be reproduced on 202405 branch
Could you please review and cherry-pick to 202405 branch?

Copy link
Contributor

@lerry-lee lerry-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yejianquan yejianquan merged commit df7cfc4 into sonic-net:master Dec 28, 2024
17 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 28, 2024
…ner check tests (sonic-net#16252)

Description of PR
Summary:
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

Approach
What is the motivation for this PR?
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console

How did you do it?
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

How did you verify/test it?
PR test will verify it.

authorized by: jianquanye@microsoft.com
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 28, 2024
…ner check tests (sonic-net#16252)

Description of PR
Summary:
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

Approach
What is the motivation for this PR?
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console

How did you do it?
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

How did you verify/test it?
PR test will verify it.

authorized by: jianquanye@microsoft.com
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #16257

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #16258

mssonicbld pushed a commit that referenced this pull request Dec 28, 2024
…ner check tests (#16252)

Description of PR
Summary:
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

Approach
What is the motivation for this PR?
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console

How did you do it?
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

How did you verify/test it?
PR test will verify it.

authorized by: jianquanye@microsoft.com
mssonicbld pushed a commit that referenced this pull request Dec 29, 2024
…ner check tests (#16252)

Description of PR
Summary:
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

Approach
What is the motivation for this PR?
Notice quite a lot flaky failures on PR test,
https://elastictest.org/scheduler/testplan/676e75562c6c7b8d3e3bd8bf?testcase=process_monitoring%2Ftest_critical_process_monitoring.py&type=console

How did you do it?
Simply wait for 120s is not enough for multi-asic kvm testbeds, enhance the config_reload to use safe_reload to make sure the testbed is healthy.

How did you verify/test it?
PR test will verify it.

authorized by: jianquanye@microsoft.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants