Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dualtor t0 mock orch crash #15628

Merged

Conversation

justin-wong-ce
Copy link
Contributor

@justin-wong-ce justin-wong-ce commented Nov 19, 2024

Description of PR

Summary:
Fixes dualtor tests when ran with mocked setup on t0 topologies.

This PR fixes 2 issues:

  1. Fix orchagent crash when setting up dualtor mock environment on a t0 topo with Broadcom platform.
    When running dualtor tests on t0 topo, the DUT has to enter a mocked dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB, which involves a Broadcom SAI attribute that is only supported when sai_tunnel_support=1 is set in syncd:/etc/sai.d/config.bcm - this attribute is not set until apply_peer_switch_table_to_dut() is run.
    Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

  2. Fix dualtor mocked tests overwriting /etc/sonic/config_db.json
    When running dualtor tests on a t0 topology, the test will overwrite /etc/sonic/config_db.json during the test, causing config_reload() at the end of the test to not restore the pre-test state of CONFIG_DB.

Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

How did you do it?

  1. Fix orchagent crash by first running the setup function apply_peer_switch_table_to_dut() that will set sai_tunnel_support=1, before adding the tunnel table with apply_tunnel_table_to_dut().

  2. Fix config mismatch by adding a fixture to backup /etc/sonic/config_db.json before the test, then restore and config reload -y it after the test. Change config reload source config file to running_golden_config instead of minigraph.

How did you verify/test it?

  1. orchagent no longer crashes.

  2. config_db.json is reverted after tests are run in each test file. However, config_check may still fail between test cases as some files only apply the fixtures once per test file instead of once per test case. This is okay as config_check fail will only cause a warning log. Configs are restored properly now.

Any platform specific information?

orchagent crash seems to only occur to Broadcom platform on Arista hwSkus.

Supported testbed topology if it's a new test case?

Documentation

When running dualtor tests on t0 topo, the DUT has to enter a mocked
dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB,
which involves a Broadcom SAI attribute that is only supported when
`sai_tunnel_support=1` is set in `syncd:/etc/sai.d/config.bcm` - this
attribute is not set until `apply_peer_switch_table_to_dut()` is run.

Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

Fix this issue by first running the setup function
`apply_peer_switch_table_to_dut()` that will set
`sai_tunnel_support=1`, before adding the tunnel table with
`apply_tunnel_table_to_dut()`.
When running dualtor tests on a t0 topology, the test will overwrite
`/etc/sonic/config_db.json` during the test, causing `config_reload()`
at the end of the test to not restore the pre-test state of CONFIG_DB.

Fix by adding a fixture to backup `/etc/sonic/config_db.json` before
the test, then restore and `config reload -y` it after the test.
@justin-wong-ce
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@lolyu lolyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@justin-wong-ce
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lolyu
Copy link
Contributor

lolyu commented Dec 8, 2024

Hi @StormLiangMS, please help review/merge.

@StormLiangMS StormLiangMS merged commit 8858f4e into sonic-net:master Dec 9, 2024
17 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 12, 2024
* Fix orchagent crash when setting up mock dualtor environment for t0

When running dualtor tests on t0 topo, the DUT has to enter a mocked
dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB,
which involves a Broadcom SAI attribute that is only supported when
`sai_tunnel_support=1` is set in `syncd:/etc/sai.d/config.bcm` - this
attribute is not set until `apply_peer_switch_table_to_dut()` is run.

Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

Fix this issue by first running the setup function
`apply_peer_switch_table_to_dut()` that will set
`sai_tunnel_support=1`, before adding the tunnel table with
`apply_tunnel_table_to_dut()`.

* Fix dualtor tests overwriting /etc/sonic/config_db.json

When running dualtor tests on a t0 topology, the test will overwrite
`/etc/sonic/config_db.json` during the test, causing `config_reload()`
at the end of the test to not restore the pre-test state of CONFIG_DB.

Fix by adding a fixture to backup `/etc/sonic/config_db.json` before
the test, then restore and `config reload -y` it after the test.

* Change to reuse old config restore and use running_golden_config source
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #16036

mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 12, 2024
* Fix orchagent crash when setting up mock dualtor environment for t0

When running dualtor tests on t0 topo, the DUT has to enter a mocked
dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB,
which involves a Broadcom SAI attribute that is only supported when
`sai_tunnel_support=1` is set in `syncd:/etc/sai.d/config.bcm` - this
attribute is not set until `apply_peer_switch_table_to_dut()` is run.

Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

Fix this issue by first running the setup function
`apply_peer_switch_table_to_dut()` that will set
`sai_tunnel_support=1`, before adding the tunnel table with
`apply_tunnel_table_to_dut()`.

* Fix dualtor tests overwriting /etc/sonic/config_db.json

When running dualtor tests on a t0 topology, the test will overwrite
`/etc/sonic/config_db.json` during the test, causing `config_reload()`
at the end of the test to not restore the pre-test state of CONFIG_DB.

Fix by adding a fixture to backup `/etc/sonic/config_db.json` before
the test, then restore and `config reload -y` it after the test.

* Change to reuse old config restore and use running_golden_config source
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 19, 2024
* Fix orchagent crash when setting up mock dualtor environment for t0

When running dualtor tests on t0 topo, the DUT has to enter a mocked
dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB,
which involves a Broadcom SAI attribute that is only supported when
`sai_tunnel_support=1` is set in `syncd:/etc/sai.d/config.bcm` - this
attribute is not set until `apply_peer_switch_table_to_dut()` is run.

Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

Fix this issue by first running the setup function
`apply_peer_switch_table_to_dut()` that will set
`sai_tunnel_support=1`, before adding the tunnel table with
`apply_tunnel_table_to_dut()`.

* Fix dualtor tests overwriting /etc/sonic/config_db.json

When running dualtor tests on a t0 topology, the test will overwrite
`/etc/sonic/config_db.json` during the test, causing `config_reload()`
at the end of the test to not restore the pre-test state of CONFIG_DB.

Fix by adding a fixture to backup `/etc/sonic/config_db.json` before
the test, then restore and `config reload -y` it after the test.

* Change to reuse old config restore and use running_golden_config source
mssonicbld pushed a commit to mssonicbld/sonic-mgmt that referenced this pull request Dec 19, 2024
* Fix orchagent crash when setting up mock dualtor environment for t0

When running dualtor tests on t0 topo, the DUT has to enter a mocked
dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB,
which involves a Broadcom SAI attribute that is only supported when
`sai_tunnel_support=1` is set in `syncd:/etc/sai.d/config.bcm` - this
attribute is not set until `apply_peer_switch_table_to_dut()` is run.

Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

Fix this issue by first running the setup function
`apply_peer_switch_table_to_dut()` that will set
`sai_tunnel_support=1`, before adding the tunnel table with
`apply_tunnel_table_to_dut()`.

* Fix dualtor tests overwriting /etc/sonic/config_db.json

When running dualtor tests on a t0 topology, the test will overwrite
`/etc/sonic/config_db.json` during the test, causing `config_reload()`
at the end of the test to not restore the pre-test state of CONFIG_DB.

Fix by adding a fixture to backup `/etc/sonic/config_db.json` before
the test, then restore and `config reload -y` it after the test.

* Change to reuse old config restore and use running_golden_config source
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #16037

mssonicbld pushed a commit that referenced this pull request Dec 19, 2024
* Fix orchagent crash when setting up mock dualtor environment for t0

When running dualtor tests on t0 topo, the DUT has to enter a mocked
dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB,
which involves a Broadcom SAI attribute that is only supported when
`sai_tunnel_support=1` is set in `syncd:/etc/sai.d/config.bcm` - this
attribute is not set until `apply_peer_switch_table_to_dut()` is run.

Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

Fix this issue by first running the setup function
`apply_peer_switch_table_to_dut()` that will set
`sai_tunnel_support=1`, before adding the tunnel table with
`apply_tunnel_table_to_dut()`.

* Fix dualtor tests overwriting /etc/sonic/config_db.json

When running dualtor tests on a t0 topology, the test will overwrite
`/etc/sonic/config_db.json` during the test, causing `config_reload()`
at the end of the test to not restore the pre-test state of CONFIG_DB.

Fix by adding a fixture to backup `/etc/sonic/config_db.json` before
the test, then restore and `config reload -y` it after the test.

* Change to reuse old config restore and use running_golden_config source
mssonicbld pushed a commit that referenced this pull request Dec 28, 2024
* Fix orchagent crash when setting up mock dualtor environment for t0

When running dualtor tests on t0 topo, the DUT has to enter a mocked
dualtor state. Part of this setup is adding a tunnel table to CONFIG_DB,
which involves a Broadcom SAI attribute that is only supported when
`sai_tunnel_support=1` is set in `syncd:/etc/sai.d/config.bcm` - this
attribute is not set until `apply_peer_switch_table_to_dut()` is run.

Changing an unsupported Broadcom SAI attribute will cause orchagent to crash.

Fix this issue by first running the setup function
`apply_peer_switch_table_to_dut()` that will set
`sai_tunnel_support=1`, before adding the tunnel table with
`apply_tunnel_table_to_dut()`.

* Fix dualtor tests overwriting /etc/sonic/config_db.json

When running dualtor tests on a t0 topology, the test will overwrite
`/etc/sonic/config_db.json` during the test, causing `config_reload()`
at the end of the test to not restore the pre-test state of CONFIG_DB.

Fix by adding a fixture to backup `/etc/sonic/config_db.json` before
the test, then restore and `config reload -y` it after the test.

* Change to reuse old config restore and use running_golden_config source
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants