-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[202205] [Mellanox] Disable SSD NCQ on Mellanox platforms #17662
Conversation
77f4137
to
8dc2dbd
Compare
@yxieca PR: #17662 is conflict with MS internal repo |
/azpw ms_conflict |
/azp ms_conflict |
Command 'ms_conflict' is not supported by Azure Pipelines. Supported commands
See additional documentation. |
/azpw ms_conflict |
@volodymyrsamotiy why this PR is still in draft mode? can we move forward? |
@yxieca this is an important bug fix. could you merge? |
Backport of #17567
Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.
Syslog error message examples:
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:
Work item tracking
How I did it
Add a kernel parameter to tell libata to disable NCQ
How to verify it
Use FIO tool -
fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
Test results with NCQ enabled:
Test results with NCQ disabled:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)