Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lock to config reload/load_minigraph #3475

Merged
merged 6 commits into from
Aug 20, 2024

Conversation

lolyu
Copy link
Contributor

@lolyu lolyu commented Aug 6, 2024

What I did

In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state.
In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel.
The file lock is binding to /etc/sonic/reload.lock.

This is to fix issue: #19855

  • Microsoft ADO (number only): 28877643

Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it

Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph.

How to verify it

UT and on testbed.

New command output (if the output of a command-line utility has changed)

  • reload with locking success
# config reload
Acquired lock on /etc/sonic/reload.lock
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
  • reload with locking failure
# config reload
Failed to acquire lock on /etc/sonic/reload.lock

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@lolyu lolyu marked this pull request as ready for review August 8, 2024 03:49
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@bingwang-ms
Copy link
Contributor

You may need to check all the reference of reload and load_minigraph as the signature is change. I remember the functions are called in some other scripts.
Another option is to add a default value for the new argument bypass_lock.

@lolyu
Copy link
Contributor Author

lolyu commented Aug 13, 2024

You may need to check all the reference of reload and load_minigraph as the signature is change. I remember the functions are called in some other scripts. Another option is to add a default value for the new argument bypass_lock.

bypass_lock is defaulted with false

bingwang-ms
bingwang-ms previously approved these changes Aug 13, 2024
zjswhhh
zjswhhh previously approved these changes Aug 13, 2024
Copy link
Contributor

@zjswhhh zjswhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Should we cherry-pick this into other release branches?

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
@lolyu lolyu dismissed stale reviews from zjswhhh and bingwang-ms via 4f7c206 August 19, 2024 03:59
@lolyu
Copy link
Contributor Author

lolyu commented Aug 19, 2024

Hi @bingwang-ms @zjswhhh, added another commit to improve the log, please help review, thanks!

Copy link
Contributor

@zjswhhh zjswhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@lolyu lolyu merged commit 4372ced into sonic-net:master Aug 20, 2024
7 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-utilities that referenced this pull request Aug 23, 2024
What I did
In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state.
In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel.
The file lock is binding to /etc/sonic/reload.lock.

This is to fix issue: #19855

Microsoft ADO (number only): 28877643
Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph.

How to verify it
UT and on testbed.

New command output (if the output of a command-line utility has changed)
reload with locking success
# config reload
Acquired lock on /etc/sonic/reload.lock
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
reload with locking failure
# config reload
Failed to acquire lock on /etc/sonic/reload.lock
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #3497

mssonicbld pushed a commit that referenced this pull request Aug 24, 2024
What I did
In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state.
In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel.
The file lock is binding to /etc/sonic/reload.lock.

This is to fix issue: #19855

Microsoft ADO (number only): 28877643
Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph.

How to verify it
UT and on testbed.

New command output (if the output of a command-line utility has changed)
reload with locking success
# config reload
Acquired lock on /etc/sonic/reload.lock
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
reload with locking failure
# config reload
Failed to acquire lock on /etc/sonic/reload.lock
lolyu added a commit to lolyu/sonic-utilities that referenced this pull request Aug 28, 2024
What I did
In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state.
In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel.
The file lock is binding to /etc/sonic/reload.lock.

This is to fix issue: #19855

Microsoft ADO (number only): 28877643
Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph.

How to verify it
UT and on testbed.

New command output (if the output of a command-line utility has changed)
reload with locking success
Acquired lock on /etc/sonic/reload.lock
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
reload with locking failure
Failed to acquire lock on /etc/sonic/reload.lock
lolyu added a commit to lolyu/sonic-utilities that referenced this pull request Aug 28, 2024
What I did
In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state.
In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel.
The file lock is binding to /etc/sonic/reload.lock.

This is to fix issue: #19855

Microsoft ADO (number only): 28877643
Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph.

How to verify it
UT and on testbed.

New command output (if the output of a command-line utility has changed)
reload with locking success
Acquired lock on /etc/sonic/reload.lock
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
reload with locking failure
Failed to acquire lock on /etc/sonic/reload.lock

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
yxieca pushed a commit that referenced this pull request Aug 28, 2024
What I did
In some cases, if multiple config reload/load_minigraph are running in parallel, they might leave the system in an error state.
In this PR, a flock is added to config reload/load_minigraph so they will not run in parallel.
The file lock is binding to /etc/sonic/reload.lock.

This is to fix issue: #19855

Microsoft ADO (number only): 28877643
Signed-off-by: Longxiang Lyu lolv@microsoft.com

How I did it
Add flock utility and decoate the reload and load_minigraph with the try_lock to ensure the lock is acquired before reload/load_minigraph.

How to verify it
UT and on testbed.

New command output (if the output of a command-line utility has changed)
reload with locking success
Acquired lock on /etc/sonic/reload.lock
Clear current config and reload config in config_db format from the default config file(s) ? [y/N]: y
Disabling container monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
reload with locking failure
Failed to acquire lock on /etc/sonic/reload.lock

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants