Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[swss] Chassis db clean up optimization and bug fixes (#16454) #16540

Merged
merged 1 commit into from
Sep 14, 2023

Conversation

vganesan-nokia
Copy link
Contributor

Why I did it

The PR is for the following:

Work item tracking

N/A

How I did it

  • Fix for regression failure due to error in finding CHASSIS_APP_DB in pizzabox (#PR 16451)
    • Chassis clean up code is re-arranged to avoid accessing CHASSIS_APP_DB for non voq switches.
  • After attempting to delete the system neighbor entries from chassis db, before starting clearing the system interface entries,
    wait for sometime only if some system neighbors were deleted. If there are no system neighbors entries deleted for the asic coming up, no need to wait.
  • Similar changes for system lag delete. Before deleting the system lag, wait for some time only if some system lag memebers were deleted. If there are no system lag members deleted no need to wait.
  • For the above two, the redis eval script to delete entries from chassis app db is made to return the number of entries deleted. Delay is introduced only if the number of entries deleted is greater than 0.
  • Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic is coming up, when system neigh entries are deleted from chassis ap db (as part of chassis db clean up), there is no orchs/process running to process the delete messages from chassis redis. Because of this, stale system neigh entries are present in the local STATE_DB. The stale entries result in
    creation of orphan (no corresponding data path/asic db entry) kernel neigh entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
    the local STATE_DB when sevice comes up.

How to verify it

  • Tested during config reload and chassis reboot.
  • When asic without system neigbor entries and system lag member entries are restarted it was observed that delay was not introduced. This was indirectly verified by checking the log for the number of entries deleted.
  • No stale kernel neighbors were found after swss came up.
  • Tested in pizzabox for config reload. The error "Invalid database name input : 'CHASSIS_APP_DB'" was not seen.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

master

Description for the changelog

Changes done in swss.sh service start script in function clean_up_chassis_db_tables().

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

* [swss] Chassis db clean up optimization and bug fixes

This commit includes the following changes:
    - Fix for regression failure due to error in finding CHASSIS_APP_DB in
    pizzabox (#PR 16451)
    - After attempting to delete the system neighbor entries from
    chassis db, before starting clearing the system interface entries,
    wait for sometime only if some system neighbors were deleted.
    If there are no system neighbors entries deleted for the asic coming up,
    no need to wait.
    - Similar changes for system lag delete. Before deleting the
    system lag, wait for some time only if some system lag memebers were
    deleted. If there are no system lag members deleted no need to wait.
    - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
    is coming up, when system neigh entries are deleted from chassis ap
    db (as part of chassis db clean up), there is no orchs/process running to
    process the delete messages from chassis redis. Because of this, stale system
    neigh are entries present in the local STATE_DB. The stale entries result in
    creation of orphan (no corresponding data path/asic db entry) kernel neigh
    entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
    the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
    the local STATE_DB when sevice comes up.

Signed-off-by: vedganes <veda.ganesan@nokia.com>

* [swss] Chassis db clean up bug fixes review comment fix - 1

Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)

Signed-off-by: vedganes <veda.ganesan@nokia.com>

---------

Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41f)
@deepak-singhal0408
Copy link
Contributor

Microsoft ADO: 25128972

@deepak-singhal0408
Copy link
Contributor

deepak-singhal0408 commented Sep 14, 2023

@lguohan can you approve this manual cherry pick PR..

Thanks

@yxieca yxieca merged commit 9ffa92c into sonic-net:202211 Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants