-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary of some kernel issues #341
Comments
To avoid making unauthorized modifications to the native kernel code that could affect everyone's compatibility, we have made some revisions under our platform. You can check out the code-level solutions in our PR [(https://github.com/sonic-net/sonic-buildimage/pull/15888).] As for whether the kernel community has better solutions, we can discuss and welcome any suggestions. |
Thank you so much for reporting these issues and even supplying solutions. Could you please create one issue per problem? |
Done. |
We recently encountered some kernel driver issues on M2-W6510-48V8C:
Problems encountered by optoe drivers
Code Path:drivers/misc/eeprom/optoe.c
Issue 1:The write_timeout value is fixed at 25ms. In the test of our products, it is found that the 25ms timeout time of some optical modules is not enough, and I2C read and write failures will occur occasionally.
Issue 2:When switch optoe’s dev_class from 2 to 1 then 1 to 3, clash occurs.
Problems encountered by tmp401 drivers
Code Path:drivers/hwmon/tmp401.c
Issue:We used the TMP411 temperature sensor chip, which occasionally read the Issue of abnormal temperature values (255.2℃).
Problems encountered by i2c-i801 drivers
Code Path:drivers/i2c/busses/i2c-i801.c
Issue 1:“SMBus is busy, can't use it!”Error happens sometimes. The I2C controller cannot be used.
Issue 2:An occasional I2C deadlock (SCL is high voltage level and SDA is low voltage level) cannot be recovered, making the I2C controller unusable.
Problems encountered by i2c-ismt drivers
Code Path:drivers/i2c/busses/i2c-ismt.c
Issue:An occasional I2C deadlock(SCL is high voltage level and SDA is low voltage level) cannot be recovered, making the I2C controller unusable.
Problems encountered by i2c-gpio drivers
Code Path:drivers/i2c/algos/i2c-algo-bit.c
Issue 1:An occasional I2C deadlock (SCL is high voltage level and SDA is low voltage level) cannot be recovered, making the I2C controller unusable.
Issue 2:Insert a faulty optical module with SDA and GND short-circuited, the I2C GPIO controller cannot identify the exception, and all the data read is 0 (The access is successful from the I2C protocol)
Problems encountered by i2c-mux-pca954x drivers
Code Path:drivers/i2c/muxes/i2c-mux-pca954x.c
Issue 1:After accessing the peripherals under PCA9548, the PCA9548 channel is not closed. One I2C controller of our product is connected to multiple PCA9548s, and each PCA9548 channel is connected to an optical module (address is 0x50). After accessing the device under PCA9548, the channel is not closed, and I2C address conflicts will occur, resulting in access exceptions.
Issue 2:After connecting the faulty optical module with SCL and GND short-circuited, and then opening the PCA9548 channel where the optical module is located. SCL is pulled to a low voltage level, causing the entire I2C controller to not work properly, and all I2C peripherals under the entire controller to be inaccessible and cannot be recovered (the fault scope has expanded)
Below are our solutions to reslove above issues:
Issue1: It may be related to the module characteristics of different manufacturers.
At present, the write_timeout is changed to 50ms in our product, which runs well. Can this parameter be opened to users to set their own timeout time (the default is 25ms, the original logic remains unchanged)?
Issue2:below is the issue code:
After run the i2c_unregister_device(optoe->client[1]), optoe->client[1] is needed to set to null.If it is not set to null, clash will occur when switch again. Below is the repaired code.
Issue reason:The TMP411 temperature reading exception is caused by the timeout mechanism of TMP411.
The following content is excerpted from the TMP411 data sheet
At present, this issue can be resolved by modifying the driver and disabling the timeout mechanism of TMP411 when driving probe.
Problems encountered by i2c-i801 drivers
Issue1:The cause of SMBus Busy is unknown. At present, we have added controller reset logic. When SMBus Busy occurs, the controller is reset, so that the I2C controller can be recovered after a fault occurs.
Issue2: The cause of I2C deadlock is unknown. Now, we have added the I2C nine clocks function. Before I2C transmission, the driver checks whether I2C deadlock occurs (SCL is high voltage level and SDA is low voltage level). If the I2C deadlock occurs, nine clocks are sent to unlock the device
Problems encountered by i2c-ismt drivers
Code Path:drivers/i2c/busses/i2c-ismt.c
Issue:The solution is the same as the i2c-i801 driver, which also imports 9 clock unlock functions
Problems encountered by i2c-gpio drivers
Code Path:drivers/i2c/algos/i2c-algo-bit.c
Issue1:The solution is the same as the i2c-i801 driver, which also imports 9 clock unlock functions
Issue2:Add I2C-GPIO voltage level detection to determine SCL and SDA voltage levels before I2C begins transmission. If the SCL and SDA voltage levels are not high at the same time, a failure is returned.
Problems encountered by i2c-mux-pca954x drivers
Code Path:drivers/i2c/muxes/i2c-mux-pca954x.c
Issue1:By default, idle_state is MUX_IDLE_AS_IS (that is, after accessing the devices under PCA9548, maintaining the original state without closing the channels). The default idle_state can only be changed by using the device tree, and X86 products do not support device trees. At present, we decided to directly modify the driver code and change the default idle_state status to MUX_IDLE_DISCONNECT
Issue2:Add the PCA9548 reset logic to the PCA9548 closed channel logic. If the PCA9548 channel fails to be closed, perform a PCA9548 reset (after the PCA9548 reset, all channels are closed) to isolate the faulty device.
Are there any better solutions or suggestions from the kernel community regarding these issues? Thanks.
The text was updated successfully, but these errors were encountered: