-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fpmsyncd Next Hop Table Enhancement HLD #1425
Conversation
doc/fpmsyncd/hld_fpmsyncd-NTT.md
Outdated
--> | ||
|
||
There should be no impact to Warmboot and Fastboot Design Impact. | ||
The change is how fpmsyncd handle message from zebra. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the nexthop group id stable? this could have impact to warm reboot, can you confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment. I will check and update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion during the last Routing WG meeting, I have added description that Warmboot and Fastboot is NOT supported when this feature is enabled. (It works same as today if disabled which is default config)
doc/fpmsyncd/hld_fpmsyncd-NTT.md
Outdated
We can disable `net.ipv4.nexthop_compat_mode` (set to 0) for performance optimization. | ||
|
||
For backward comptibility, we can keep `fpm use-next-hop-groups` disabled by default in FRR configuration. | ||
This way, zebra would not send `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` to fpmsyncd and entry created in `APPL_DB` should be identical to when this NHG feature did not exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need sonic config db to enable and disable this feature, and this new feature should be disable by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Yes, we intend to make this disabled by default.
We are also studying how to make this configurable using config db.
I will update the HLD and make it clear in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion during the last Routing WG meeting, I have added requirement to disable/enable this via CONFIG_DB in the Configuration and management section.
Will update more details like data models after further investigating how this should be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated Configuration and management section with description of how this feature will be disabled(default) and enabled using CONFIG_DB and config_db.json
. One can directly edit config_db.json
file or use Klish CLI to enable/disable this feature. Also, corresponding test case are added.
--> | ||
|
||
This design modifies `fpmsyncd` to use the new `APPL_DB` tables. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed during the review, it is more about move of logic of NHG creation from SWSS (Route/NHG OA) to Zebra. What is the main motivation for this move ? How such features as Dynamically Ordered ECMP (used by MSFT) with per NHGm's Sequence number work with this move ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the main motivation for this move ?
Motivation is to reduce NHG/ROUTE translation in zebra.
Also, supporting NHG in fpmsyncd as a first step to improve update performance (reducing DB change) by zebra to update a NHG shared among routes instead of updating all ECMP routes in ROUTE entry. (e.g., when NHG member was reduced from n to n-1)
How such features as Dynamically Ordered ECMP (used by MSFT) with per NHGm's Sequence number work with this move ?
My understanding is this would not impact people using SWSS to create NHG if the feature is disabled (by default) since fpm will not send RTM_NEWNEXTHOP
and thus fpmsyncd will set entry in CONFIG_DB using ROUTE entry as today.
Do you expect anyone to use Dynamically Ordered ECMP / per NHGm's Sequence number while enabling NHG in fpmsyncd in near future? (While I agree that it would be nice to have both available in long term but could be a different PR in future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case would be BGP PIC, and recursive routes handling. BGP PIC has started in design discussion in routing WG. Recursive routes support would be discussed after.
https://lists.sonicfoundation.dev/g/sonic-wg-routing/wiki/34786
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated "Scope" for clarification and added PIC example in "Overview" section.
- When `fpmsyncd` receive the `RTM_NEWROUTE` on sequence 4, the process will write the NextHop Group with ID 118 to the `NEXTHOP_GROUP_TABLE` using the information of gateway and interface from the NextHop Group events with IDs 116 and 117. | ||
- Then `fpmsyncd` will create a new route entry to `ROUTE_TABLE` with a `nexthop_group` field with value `ID118`. | ||
- When `fpmsyncd` receives the last `RTM_NEWROUTE` on sequence 5, the process will create a new route entry (but no NextHop Group entry) in `ROUTE_TABLE` with `nexthop_group` field with value `ID118`. (Note: This NextHop Group entry was created when the `fpmsyncd` received the event sequence 4.) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Zebra/FRR re-create the NHG (delete old one + create new one) and update all routes with new ID on change of NHG (e.g. due to failed link to NHGm) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if there are one or more route referring to the NHG with members before the change.
(In this case we will have an old NHG with n members referenced from some routes and a newly created NHG n-x members referenced from other routes)
I think it's No if there are no routes referencing to the NHG with members before the change.
But I'm not 100% confident so let me check again and update the result here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would depend on NHG type (BGP vs IGP), the type of triggered events and how the trigger events get handled in zebra.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added explanation that NHG ID and members are managed by FRR in "Overview" section for clarification.
|
||
- FRR configuration | ||
- (1) config zebra to use `dplane_fpm_nl` instead of `fpm` module (this is default since 202305 release) | ||
- (2) set `fpm use-nexthop-groups` option (this is disabled by default and enabled via `CONFIG_DB`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it assume the use of Unified Management Framework ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this SONiC_Design_Doc_Unified_FRR_Mgmt_Interface.md the Unified Management Framework you have asked?
If yes, we are using sonic-cfggen and zebra.conf.j2
template to generate zebra config file based on CONFIG_DB entry.
I have PR the source code |
|
||
### Configuration and management | ||
|
||
This NextHop Group feature is enabled/disabled by config option of zebra (BGP container): `[no] fpm use-next-hop-groups` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you follow example here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. @nakano-omw is working on moving enabled/disabled config under DEVICE_METADATA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated both HLD and Source Code to config use DEVICE_METADATA.
@ebiken-ntt do you plan to add NHG update handling when local link is down? i.e. take some benefits from using NHG. What's your plan on that? |
No, at least not for the 202311. We expect FRR to notify any NHG update via dplane_fpm_nl including ones due to local link down. But we can discuss for future release if there are any idea. |
Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
Hi @zhangyanzhao @eddieruan-alibaba |
I have approved this HLD. You could merge it. We will continue this path further to PIC and recursive routes convergence support. |
@eddieruan-alibaba I can't merge because I don't have enough permissions. Would you be able to grant me the permission? |
@eddieruan-alibaba can you run merge ? |
@eddieruan-alibaba @lguohan @StormLiangMS @ntt-omw Due to recent performance issues found in fib suppression feature we plan to disable the feature in 202405. The dplane_fpm_nl is tied to fib suppression feature and would not be available if fib suppression feature is removed. |
@dgsudharsan Could you please elaborate on the performance issue? Is it a FRR issue? |
Yes. Its an FRR issue. If there are large number of routes and if link flap happens resulting in multiple withdraw and advertisement of the routes, zebra's processing queue will be built up resulting in zebra consuming memory in GBs. This memory will not get freed to the OS (due to low level libc implementation) although its freed by zebra at the end. So systems with lower memory might get impacted and would eventually run out of memory. |
@dgsudharsan do we have an issue opened for this issue? We could discuss this issue in routing WG. Alibaba saw this issue as well. I could ask someone from my team to discuss our workaround in FRR. |
@nakano-omw where are you on the work which @hasan-brcm and NTT worked on to further enhancing NHG handling? |
@dgsudharsan But why fib suppression feature is removed. This issue is independent to this feature, right? |
@eddieruan-alibaba We removed it in 202305 and 202311 sonic-net/sonic-buildimage#17660 . With this feature since there is a delay zebra queue builds up. |
The NHG support we are developing will not work if dplane_fpm_nl is disabled because it is required. |
@nakano-omw @dgsudharsan I plan to keep dplane_fpm_nl for 202405 |
I have submitted a description of the issue to the FRR community: FRRouting/frr#15016 |
review is on-going, keep it in 202405 release for now. |
@nakano-omw @lguohan @eddieruan-alibaba Stepan did some experiments and confirmed that there is no memory issue with dplane_fpm_nl without fib suppression and the reason for memory increase is due to fib suppression feature itself. We will not revert dplane_fpm_nl but disable fib suppression feature alone. |
code PRs are not approved yet, move to backlog |
code prs are not merged yet, move to backlog |
Fpmsyncd Next Hop Table Enhancement HLD