SPI CTRL0 register access unsafe - race condition can cause spurious transmissions #713

gavanfantom · 2023-08-17T12:55:07Z

The SPIn_CTRL0 register contains the '''start''' field, which triggers the start of a transaction. The user guide cautions that this bit should not be set until pending transactions have finished. The SPI driver performs a read/modify/write operation on this register while a transaction is in progress, in order to change different fields.

This read/modify/write causes a 1 bit to be written to the '''start''' field if a transaction was in progress when the read was performed. It's not clear what happens if the 1 bit is written back while the transaction is still in progress, but it appears to be harmless. However, if the transaction finishes after the register is read but before it is written back, this will start a new transaction in the hardware.

It just so happens that on the MAX32670, if the instruction cache is enabled, and a 1 byte read/write transaction is performed, it hits this race condition and causes 2 bytes to be sent. Disabling the instruction cache works around the issue, but the underlying problem is still present. It is not safe to do a read/modify/write on this register when a SPI transaction is in progress, unless the '''start''' field is cleared to zero before the value is written back.

While I've found this with the spi_reva1.c driver, a quick read of the reva2 driver suggests that this issue may affect that driver too.

Jake-Carter · 2023-08-18T21:22:39Z

Thanks for reporting this.

The most likely culprit is MXC_SPI_RevA1_MasterTransHandler.

This logic seems terrible...

The start bit is labeled as R/W1O, so clearing it to 0 as a solution should be safe and have no effect on device operation.

We're working on improved SPI drivers in #662. These v2 drivers have done away with the poorly placed ctrl0 interactions for SS assertion/deassertion, though we should fix this for v1 too.

@sihyung-maxim FYI v2 might still be susceptible to this here and here, but since we clear the EN bit beforehand it seems a lot less likely.

Jake-Carter · 2023-08-18T21:28:49Z

@sihyung-maxim I'm linking the PR as a potential candidate to close this. We should test the reported condition with v2:

ICC enabled
TX length = 1 byte
RX length = 1 byte

Bug: TX byte is transmitted twice

gavanfantom · 2023-08-20T21:06:21Z

The most likely culprit is MXC_SPI_RevA1_MasterTransHandler.

This logic seems terrible...

That is exactly the problem, yes.

The start bit is labeled as R/W1O, so clearing it to 0 as a solution should be safe and have no effect on device operation.

That's right. I tested exactly that in those two places, and the fix worked:

    spi->ctrl0 = (spi->ctrl0 & ~MXC_F_SPI_REVA_CTRL0_START) | MXC_F_SPI_REVA_CTRL0_SS_CTRL;

    spi->ctrl0 &= ~(MXC_F_SPI_REVA_CTRL0_START | MXC_F_SPI_REVA_CTRL0_SS_CTRL);

sihyung-maxim · 2023-08-21T19:46:11Z

@sihyung-maxim I'm linking the PR as a potential candidate to close this. We should test the reported condition with v2:

ICC enabled

TX length = 1 byte

RX length = 1 byte

Bug: TX byte is transmitted twice

This test case passes for v2. TX byte is only transmitted once. However, I will add a check to ensure a transaction is not in progress before setting the start bit.

As a side note: The v1 (non-DMA) implementation works by restarting the transaction every 32 messages. The MXC_SPI_RevA1_MasterTransHandler(...) function can only handle up to 32 messages per function call (the 32 messages limitation being the size of the SPI FIFO). This results in MXC_SPI_RevA1_MasterTransHandler(...) getting called several times for transaction lengths greater than 32 messages which means the SPI_CTRL0.ss_ctrl and SPI_CTRL0.start bits are repeatedly set or cleared every 32 messages.

The v2 does not restart the SPI block over the course of a transaction, so the SPI_CTRL0.ss_ctrl and SPI_CTRL0.start fields are only set/cleared once at the beginning of each transaction.

khpeterson · 2023-08-22T18:05:35Z

Are there other read/modify/writes of spi->ctrl0 that are vulnerable to this problem? I'm thinking of a case where a master DMA transaction follows a master transaction, but maybe clearing START as described above is enough.

khpeterson · 2023-08-23T23:30:43Z

FWIW, this fix resolves a case of "stuck DMA" that I have been fighting for a while. Occasionally, my app ends up in a state where SPI thinks its done but DMA is waiting to complete:

(gdb) p/x *MXC_SPI1
$35 = {{data32 = 0x0, data16 = {0x0, 0x0}, data8 = {0x0, 0x0, 0x0, 0x0}}, ctrl0 = 0x10003, ctrl1 = 0x1820182, ctrl2 = 0x800, ss_time = 0x10101, clk_cfg = 0x202, rsv_0x18 = 0x0, dma = 0x80408041, int_fl = 0x8807, int_en = 0x0, wake_fl = 0x0, wake_en = 0x0, stat = 0x0}
(gdb) p/x *MXC_DMA1
$36 = {cn = 0x3, intr = 0x0, rsv_0x8_0xff = {0x0 <repeats 62 times>}, ch = {{cfg = 0x80040210, st = 0x0, src = 0x20003886, dst = 0x14, cnt = 0x0, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}, {cfg = 0x80400011, st = 0x1, src = 0x20003724, dst = 0x20003a04, cnt = 0x4, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}, {cfg = 0x0, st = 0x0, src = 0x0, dst = 0x0, cnt = 0x0, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}, {cfg = 0x0, st = 0x0, src = 0x0, dst = 0x0, cnt = 0x0, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}, {cfg = 0x0, st = 0x0, src = 0x0, dst = 0x0, cnt = 0x0, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}, {cfg = 0x0, st = 0x0, src = 0x0, dst = 0x0, cnt = 0x0, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}, {cfg = 0x0, st = 0x0, src = 0x0, dst = 0x0, cnt = 0x0, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}, {cfg = 0x0, st = 0x0, src = 0x0, dst = 0x0, cnt = 0x0, src_rld = 0x0, dst_rld = 0x0, cnt_rld = 0x0}}}

Without the fix this my app gets into this state around 4 times in 8 hours of continuous operation; with it I have been able to run for nearly 24 hours with a single occurrence.

Thank you @gavanfantom!

- Clear START bit to 0 before writing back to register - #713

Jake-Carter added the bug Something isn't working label Aug 18, 2023

Jake-Carter linked a pull request Aug 18, 2023 that will close this issue

feat(Examples,PeriphDrivers)!: Add SPI v2 Target Transaction functions #662

Merged

6 tasks

khpeterson added a commit to khpeterson/analog-devices-msdk-fork that referenced this issue Aug 22, 2023

quick pick up fix for analogdevicesinc#713

19f1257

Jake-Carter added a commit that referenced this issue Sep 21, 2023

Fix race condition on ctrl0 read-modify-writes in spi_reva1

6b7b322

- Clear START bit to 0 before writing back to register - #713

sihyung-maxim closed this as completed in #662 Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPI CTRL0 register access unsafe - race condition can cause spurious transmissions #713

SPI CTRL0 register access unsafe - race condition can cause spurious transmissions #713

gavanfantom commented Aug 17, 2023

Jake-Carter commented Aug 18, 2023

Jake-Carter commented Aug 18, 2023

gavanfantom commented Aug 20, 2023

sihyung-maxim commented Aug 21, 2023

khpeterson commented Aug 22, 2023 •

edited

Loading

khpeterson commented Aug 23, 2023

SPI CTRL0 register access unsafe - race condition can cause spurious transmissions #713

SPI CTRL0 register access unsafe - race condition can cause spurious transmissions #713

Comments

gavanfantom commented Aug 17, 2023

Jake-Carter commented Aug 18, 2023

Jake-Carter commented Aug 18, 2023

gavanfantom commented Aug 20, 2023

sihyung-maxim commented Aug 21, 2023

khpeterson commented Aug 22, 2023 • edited Loading

khpeterson commented Aug 23, 2023

khpeterson commented Aug 22, 2023 •

edited

Loading