You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While reading through the implementation of the swap using move, I noticed a corner case that could, if I'm not mistaken, brick the device in case a power failure occurs during a revert process.
Let's suppose after an upgrade you have a non-functional image in the primary slot. The image won't be confirmed, leading to a revert at next boot. At the beginning of the revert process, fixup_revert is invoked, which rewrites the trailer in the secondary slot so that the revert looks like a permanent upgrade. Normally, after the execution of this routine, the secondary slot has a valid trailer, in particular with a valid magic number.
Let's imagine a power failure occurs during the writing of the trailer's magic, i.e. in boot_write_magic (see here). The magic number in the secondary slot is in an undefined state and might be partially written, which implies at next boot it will be considered in BOOT_MAGIC_BAD state.
This doesn't match any state leading to an upgrade or revert process to be initiated, so boot_swap_type_multi will return BOOT_SWAP_TYPE_NONE. This means MCUboot will not perform the revert and attempt to boot from the primary slot, containing a non-functional image. Hence, the device is bricked unless it is possible to reflash the secondary slot without a functional image.
Note at the time fixup_revert is called, the boot status is virgin, meaning boot_status_is_reset(bs) is true at next boot so MCUboot won't detect a revert process was interrupted and won't try to resume it.
How to reproduce
To easily reproduce the issue, it is possible to modify the boot_write_magic routine to write the magic in two calls to flash_area_write. For instance, it is possible to replace:
And to to run the tests for swap-move on the simulator:
cargo test --features "swap-move"
During the revert_with_fails test, a power failure is simulated during every modification to the flash memory, so in particular between the two flash_area_write in boot_write_magic. This will lead to an error:
[ERROR bootsim::image] Revert failed at interruption 5
[...]
[ERROR bootsim::image] Revert failed at interruption 236
test revert_with_fails ... FAILED
[...]
---- revert_with_fails stdout ----
thread 'revert_with_fails' panicked at sim/tests/core.rs:57:1:
assertion failed: !image.run_revert_with_fails()
I didn't try on real hardware, but I'm happy to do so if you consider that useful.
The text was updated successfully, but these errors were encountered:
taltenbach
changed the title
swap-move: Power failure during magic writing could brick the device
swap-move: Power failure during the writing of the magic could brick the device
May 27, 2024
I had some time to look a bit further into this issue. A very simple fix would be to modify the boot_swap_tables in bootutil_public.c (see here) so that a revert is performed whatever the state of the magic number in the secondary slot. This means we would have:
Intuitively, this sounds good to me: a revert is performed if and only if the copy-done flag is set in the primary slot but the image-ok flag is not. I also tested this change on the simulator and all tests passed.
@utzig@d3zd3z Are you aware of any reason why the BOOT_MAGIC_UNSET state is currently required for the secondary slot's trailer?
Description
While reading through the implementation of the swap using move, I noticed a corner case that could, if I'm not mistaken, brick the device in case a power failure occurs during a revert process.
Let's suppose after an upgrade you have a non-functional image in the primary slot. The image won't be confirmed, leading to a revert at next boot. At the beginning of the revert process,
fixup_revert
is invoked, which rewrites the trailer in the secondary slot so that the revert looks like a permanent upgrade. Normally, after the execution of this routine, the secondary slot has a valid trailer, in particular with a valid magic number.Let's imagine a power failure occurs during the writing of the trailer's magic, i.e. in
boot_write_magic
(see here). The magic number in the secondary slot is in an undefined state and might be partially written, which implies at next boot it will be considered inBOOT_MAGIC_BAD
state.So, at next boot, we have the following state:
This doesn't match any state leading to an upgrade or revert process to be initiated, so
boot_swap_type_multi
will returnBOOT_SWAP_TYPE_NONE
. This means MCUboot will not perform the revert and attempt to boot from the primary slot, containing a non-functional image. Hence, the device is bricked unless it is possible to reflash the secondary slot without a functional image.Note at the time
fixup_revert
is called, the boot status is virgin, meaningboot_status_is_reset(bs)
is true at next boot so MCUboot won't detect a revert process was interrupted and won't try to resume it.How to reproduce
To easily reproduce the issue, it is possible to modify the
boot_write_magic
routine to write the magic in two calls toflash_area_write
. For instance, it is possible to replace:mcuboot/boot/bootutil/src/bootutil_public.c
Line 307 in 45d379e
With:
And to to run the tests for swap-move on the simulator:
During the
revert_with_fails
test, a power failure is simulated during every modification to the flash memory, so in particular between the twoflash_area_write
inboot_write_magic
. This will lead to an error:I didn't try on real hardware, but I'm happy to do so if you consider that useful.
The text was updated successfully, but these errors were encountered: