Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deterministic link bring-up feature for SFF compliant modules #383

Merged
merged 26 commits into from
Mar 7, 2024

Conversation

longhuan-cisco
Copy link
Contributor

@longhuan-cisco longhuan-cisco commented Jun 26, 2023

HLD update is in different PR (link: sonic-net/SONiC#1371)

Description

According to Interface-Link-bring-up-sequence.md, add a new thread sff_mgr under xcvrd to provide deterministic link bringup feature for SFF compliant modules (100G/40G).

By default sff_mgr is disabled, to enable sff_mgr on a platform, add 'enable_xcvrd_sff_mgr' and set it to true in pmon_daemon_ctrl json file.

Motivation and Context

  • Scope of sff_mgr: 100G/40G optics (copper/aoc not in the scope)

  • Why sff_mgr
    The goal of sff_mgr is to make sure SFF compliant modules are brought up in a deterministc way, meaning TX is enabled only after host_tx_ready becomes True, and TX will be disabled when host_tx_ready becomes False. This will help eliminate link stability issue and potential interface flap, also turning off TX reduces the power consumption and avoid any lab hazard for admin shut interface.

  • What sff_mgr does
    sff_mgr is a new thread inside Xcvrd. It will only turn on module Tx if both admin_status=up and host_tx_ready=true.
    sff_mgr will skip if neither of these below events happens: transceiver insertion event (including bootup and hot plugin), host_tx_ready change event and admin_status change event. All other cases are ignored.
    To detect these events, sff_mgr listens to below DB tables:

    • STATE_DB PORT_TABLE's host_tx_ready field for host_tx_ready change event.
    • STATE_DB TRANSCEIVER_INFO table's type field for insesrtion event.
    • CONFIG_DB PORT_TABLE's admin_status field for admin_status change event, and info such as index/channel/etc.
  • For platforms/vendors:
    This feature is enabled on per platform basis. There could be cases where vendor(s)/platform(s) may take time to shift from existing codebase to the model (work-flows).
    By default this feature is disabled. No impact for the platforms in current deployment. To enable it, a flag needs to be added to pmon_daemon_control.json and set to true.

    • pre-requisite to enable this feature: platform needs to keep TX in disabled state after module coming out-of-reset, in either module insertion or bootup cases. This is to make sure the module is not transmitting with TX enabled before host_tx_ready becomes True.

How Has This Been Tested?

  1. On platform with sff_mgr enabled which has 100G and 40G optics, verified TX is enabled/disabled properly based on host_tx_ready and in proper order, under cases of module insertion/system bootup. And no-op for the cases of xcvrd process restart.
  2. Verified TX is enabled/disabled properly along with "config interface shutdown/startup" (which changes admin_status and host_tx_ready value)
  3. Verified no impact on platform without sff_mgr enabled.

Additional Information (Optional)

@longhuan-cisco
Copy link
Contributor Author

@prgeor @mihirpat1 could you please take a look at the PR?

CC @shyam77git @jaganbal-a

sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@longhuan-cisco longhuan-cisco requested a review from keboliu August 2, 2023 16:57
if mask == 0:
self.log_notice("{}: No change is needed for tx_disable value".format(lport))
continue
if api.tx_disable_channel(mask, target_tx_disable_flag):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bring a module to high-power before enabling TX ? In CMIS State machine it is done as a part of CMIS Data Path SM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, non-cmis/ccmis optics are brought to High Power mode by default by platform.
The missing part is: enabling high power class by setting page 0 byte 93 bit 2(if advertised power class >=5). Yes, this handling will be added to sff_mgr in separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "platform", it means there could be different implementation. So, does SFF standard require "platform" to set module to high power? If not, I prefer to add a check here and make sure module is in high power

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "platform", it means there could be different implementation. So, does SFF standard require "platform" to set module to high power?

I didn't see standard mentions who (PI or platform) should do it.
In today's sonic PI code, there's no place to turn module to high power mode. Then I assume platform side does something (either via SW or HW), otherwise how 100G optics (which needs high power mode to be functional) can come up on today's system for all platforms/vendors.

I prefer to add a check here and make sure module is in high power

Yes, that was part of the plan for high power handling in sff_mgr, basically sff_mgr will do two things:

  1. check and make sure module in high power mode if needed
  2. check and enable high power class for module if needed (power class >= 5)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raised issue to track high power handling #414

else:
self.log_error("{}: Failed to {} TX with channel mask: {}".format(
lport, "disable" if target_tx_disable_flag else "enable", bin(mask)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to add the Custom module SI parameters here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, custom optics SI is supposed to be handled by sff_mgr. The detail is still in discussion. This will be added via separate PR also.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, when high-power and custom SI are supported the following sequence should be ensured for bringing a module up:

  1. rx_output_disable=1
  2. high-power
  3. Set customer SI (per lane speed)
  4. rx_output_disable=0
  5. tx_disable=0

Agree ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, when high-power and custom SI are supported the following sequence should be ensured for bringing a module up:

  1. rx_output_disable=1
  2. high-power
  3. Set customer SI (per lane speed)
  4. rx_output_disable=0
  5. tx_disable=0

Agree ?

At the time of high-power and custom SI are supported, 2, 3, 5 would be ensured.

sff spec doesn't seem to mention rx_output_disable/enable as mandatory, and they were not in the original discussion of sff_mgr. But 1 and 4 can be discussed or taken care of via subsequent PRs.

As long as we have the main structure of sff_mgr in place, it would be easy for people to add on top of it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised issue to track SI setting: #415

@prgeor
Copy link
Collaborator

prgeor commented Nov 27, 2023

@mihirpat1 could you review?

@prgeor prgeor added the xcvrd label Nov 27, 2023
@prgeor
Copy link
Collaborator

prgeor commented Dec 6, 2023

@longhuan-cisco could you check the conflict

@longhuan-cisco
Copy link
Contributor Author

@longhuan-cisco could you check the conflict

sure, addressed the conflict.

keboliu
keboliu previously approved these changes Dec 11, 2023
sonic-xcvrd/xcvrd/sff_mgr.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/sff_mgr.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/sff_mgr.py Outdated Show resolved Hide resolved
if mask == 0:
self.log_notice("{}: No change is needed for tx_disable value".format(lport))
continue
if api.tx_disable_channel(mask, target_tx_disable_flag):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "platform", it means there could be different implementation. So, does SFF standard require "platform" to set module to high power? If not, I prefer to add a check here and make sure module is in high power

sonic-xcvrd/xcvrd/sff_mgr.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/sff_mgr.py Outdated Show resolved Hide resolved
sonic-xcvrd/xcvrd/xcvrd.py Outdated Show resolved Hide resolved
@longhuan-cisco
Copy link
Contributor Author

@longhuan-cisco why copper/AOC is not in the scope of SFF manager as per PR description?

AFAIK, those doesn't have ability of doing tx_disable/enable

@longhuan-cisco
Copy link
Contributor Author

Any updates on this PR? Haven't seen recent activity and would appreciate an ETA if possible.

@longhuan-cisco : Can you provide latest update on this.

Updated the diff and raised two other PR's based on latest comments from Prince's.

the other PRs:
#430
sonic-net/sonic-buildimage#18054

@rlhui
Copy link

rlhui commented Feb 23, 2024

please resolve conflicts

@longhuan-cisco
Copy link
Contributor Author

please resolve conflicts

conflicts resolved.

@longhuan-cisco longhuan-cisco requested a review from prgeor March 5, 2024 23:31
@judyjoseph
Copy link
Contributor

@prgeor Please merge if the review is completed.

@prgeor
Copy link
Collaborator

prgeor commented Mar 7, 2024

@longhuan-cisco why AOC is outside scope?

Scope of sff_mgr: 100G/40G optics (copper/aoc not in the scope)

@longhuan-cisco
Copy link
Contributor Author

@longhuan-cisco why AOC is outside scope?

Scope of sff_mgr: 100G/40G optics (copper/aoc not in the scope)

@prgeor This was answered at #383 (comment)

AFAIK, those doesn't have ability of doing tx_disable/enable

@longhuan-cisco longhuan-cisco requested a review from prgeor March 7, 2024 03:47
@prgeor
Copy link
Collaborator

prgeor commented Mar 7, 2024

@longhuan-cisco there is some conflict, can you rebase?

@longhuan-cisco
Copy link
Contributor Author

@longhuan-cisco there is some conflict, can you rebase?

@prgeor I didn't see conflicts, it just said branch "out-of-date". Anyway I updated it to latest master. please check.

@prgeor prgeor merged commit 8a5ca2b into sonic-net:master Mar 7, 2024
5 checks passed
rlhui pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Mar 12, 2024
This PR is a dependency of sonic-net/sonic-platform-daemons#383
HLD of sff_mgr: sonic-net/SONiC#1371

Why I did it
Add enable_xcvrd_sff_mgr flag support for sff_mgr
saksarav-nokia pushed a commit to saksarav-nokia/sonic-buildimage that referenced this pull request Mar 12, 2024
This PR is a dependency of sonic-net/sonic-platform-daemons#383
HLD of sff_mgr: sonic-net/SONiC#1371

Why I did it
Add enable_xcvrd_sff_mgr flag support for sff_mgr
noaOrMlnx pushed a commit to noaOrMlnx/sonic-platform-daemons that referenced this pull request Mar 14, 2024
…nic-net#383)

* SFF manager for handling QSFP+/QSFP28 transceiver modules
yuazhe pushed a commit to yuazhe/sonic-platform-daemons that referenced this pull request Jul 2, 2024
…nic-net#383)

* SFF manager for handling QSFP+/QSFP28 transceiver modules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants