Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed determine/process reboot-cause service dependency (#17406) #132

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

anamehra
Copy link
Contributor

Signed-off-by: anamehra anamehra@cisco.com

Why I did it

Fixes sonic-net/sonic-buildimage#16990 for 202405 branch

Cherry-picked PR sonic-net/sonic-buildimage#17406 due to conflict.

  1. determine-reboot-cause and process-reboot-cause service does not start If the database service fails to restart in the first attempt. Even if the Database service succeeds in the next attempt, these reboot-cause services do not start.

  2. The process-reboot-cause service also does not restart if the docker or database service restarts, which leads to an empty reboot-cause history

  3. deploy-mg from sonic-mgmt also triggers the docker service restart. The restart of the docker service caused the issue stated in 2 above. The docker restart also triggers determine-reboot-cause to restart which creates an additional reboot-cause file in history and modifies the last reboot-cause.

This PR fixes these issues by making both processes start again when dependency meets after dependency failure, making both processes restart when the database service restarts, and preventing duplicate processing of the last reboot reason.

Work item tracking
  • Microsoft ADO 25892856

How I did it

  1. Modified systemd unit files to make determine-reboot-cause and process-reboot-cause services restartable when the database service restarts.
  2. On the restart, the determine-reboot-cause service should not recreate a new reboot-cause entry in the database. Added check for first start or restart to skip entry for restart case.

How to verify it

On single asic pizza box:

  1. Installed the image and check reboot-cause history
  2. restart database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.

On Chassis:

  1. Installed the image and check reboot-cause history
  2. restart the database service and verify that determine-reboot-cause and process-reboot-cause services also restart. Verify that reboot-cause shows correct data and no new entry is created for restart.
  3. Reboot LC. On Supervicor, stop database-chassis service.
    Let database service on LC fail the first time. determine-reboot-cause and process-reboot-cause would fail to start due to dependency failure
    start database-chassis on Supervisor. Database service on LC should now start successfully.
    Verify determine-reboot-cause and process-reboot-cause also starts
    Verify show reboot-cause history output

Signed-off-by: anamehra <anamehra@cisco.com>
@anamehra
Copy link
Contributor Author

@abdosi , @gechiang , for your viz! Thanks

@anamehra
Copy link
Contributor Author

Hi @abdosi , @gechiang , changed base branch to master. Please review. Thanks

@abdosi abdosi merged commit 60fdfea into sonic-net:master Jun 28, 2024
5 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-host-services that referenced this pull request Jul 9, 2024
…ic-net#132)

Signed-off-by: anamehra <anamehra@cisco.coFixes sonic-net/sonic-buildimage#16990 for 202405 branch



determine-reboot-cause and process-reboot-cause service does not start If the database service fails to restart in the first attempt. Even if the Database service succeeds in the next attempt, these reboot-cause services do not start.

The process-reboot-cause service also does not restart if the docker or database service restarts, which leads to an empty reboot-cause history

deploy-mg from sonic-mgmt also triggers the docker service restart. The restart of the docker service caused the issue stated in 2 above. The docker restart also triggers determine-reboot-cause to restart which creates an additional reboot-cause file in history and modifies the last reboot-cause.

This PR fixes these issues by making both processes start again when dependency meets after dependency failure, making both processes restart when the database service restarts, and preventing duplicate processing of the last reboot reason.m>
@mssonicbld
Copy link

Cherry-pick PR to 202405: #138

mssonicbld pushed a commit that referenced this pull request Jul 10, 2024
Signed-off-by: anamehra <anamehra@cisco.coFixes sonic-net/sonic-buildimage#16990 for 202405 branch



determine-reboot-cause and process-reboot-cause service does not start If the database service fails to restart in the first attempt. Even if the Database service succeeds in the next attempt, these reboot-cause services do not start.

The process-reboot-cause service also does not restart if the docker or database service restarts, which leads to an empty reboot-cause history

deploy-mg from sonic-mgmt also triggers the docker service restart. The restart of the docker service caused the issue stated in 2 above. The docker restart also triggers determine-reboot-cause to restart which creates an additional reboot-cause file in history and modifies the last reboot-cause.

This PR fixes these issues by making both processes start again when dependency meets after dependency failure, making both processes restart when the database service restarts, and preventing duplicate processing of the last reboot reason.m>
@anamehra anamehra deleted the anamehra/reboot_cause branch October 31, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants