Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RT1020 OTA: removed unnecessary compiler optimisation #2952

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

robertc2000
Copy link
Collaborator

@robertc2000 robertc2000 commented Nov 4, 2024

The compiler decided to use memset/memcpy in flexspi_nor_get_config function, which will lead to a crash during the firmware swap when we erase the sector in which stdlib resides.

I rewrote this function to make sure this optimization does not happen.

Removed sector allocation from single_bank_swap.

Added MG_IRAM to flash_wait.

@@ -212,8 +212,11 @@ MG_IRAM static int flexspi_nor_get_config(
.ipcmdSerialClkFreq = 1,
.blockSize = 64 * 1024,
.isUniformBlockSize = false};

*config = default_config;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, struct copy usually ends up in stdlib calls...

src/ota_imxrt.c Outdated
Comment on lines 270 to 273
MG_ARM_DISABLE_IRQ();
ret = (flexspi_nor_get_config(&config) == 0);
if (ret) ret = flash_erase(&config, addr);
MG_ARM_ENABLE_IRQ();
if (!s_flash_irq_disabled) MG_ARM_ENABLE_IRQ();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this one disabled through #if 0 is intended to be a standalone call to just erase flash, it is not meant to be called in the OTA process itself, just if we reinstall our data load/save schema in the future.
As it works by itself, it needs to reenable IRQs when exiting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTA code only calls flash_erase(), not this function

src/ota_imxrt.c Outdated
struct mg_flexspi_nor_config *config) {
struct mg_flexspi_nor_config default_config = {
struct mg_flexspi_nor_config *config) {
static struct mg_flexspi_nor_config default_config = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does static make a difference ? If it is dynamic, the compiler should generate a bunch of code to fill the struct in the stack, then memory released. If it is static, the compiler will probably init this function at startup and memory be used forever.

Does our nice compiler use stdlib calls to fill this struct in the stack ?

Then I think our copy doesn't make much sense, we should probably declare this as module static (not function static) and name it and/or change the code to access this data. (With a nice comment remembering us that this must reside in RAM)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not use static, then the structure will be allocated on the stack, the compiler uses memset to zero it, and then it initializes it with the corresponding values. If we use 'static', memset is no longer used, the variable is probably initialized as global and the code just loads its address.

Here is the generated code when static is not used. Note that it reserves space for the variable on stack and then clears it with memset and then proceeds to fill it with data.

00000e1c <flexspi_nor_get_config>:
     e1c:	b510      	push	{r4, lr}
     e1e:	f5ad 7d02 	sub.w	sp, sp, #520	; 0x208
     e22:	f44f 7200 	mov.w	r2, #512	; 0x200
     e26:	2100      	movs	r1, #0
     e28:	4604      	mov	r4, r0
     e2a:	ab02      	add	r3, sp, #8
     e2c:	4618      	mov	r0, r3
     e2e:	f000 f9b3 	bl	1198 <__memset_veneer>
     e32:	4924      	ldr	r1, [pc, #144]	; (ec4 <flexspi_nor_get_config+0xa8>)
     e34:	4a24      	ldr	r2, [pc, #144]	; (ec8 <flexspi_nor_get_config+0xac>)
..........

When static is used, the code is this:

0000101c <flexspi_nor_get_config>:
    101c:	b082      	sub	sp, #8
    101e:	2300      	movs	r3, #0
    1020:	4a08      	ldr	r2, [pc, #32]	; (1044 <flexspi_nor_get_config+0x28>)
    1022:	9301      	str	r3, [sp, #4]
    1024:	9b01      	ldr	r3, [sp, #4]
    1026:	f5b3 7f00 	cmp.w	r3, #512	; 0x200
    102a:	d302      	bcc.n	1032 <flexspi_nor_get_config+0x16>
    102c:	2000      	movs	r0, #0
    102e:	b002      	add	sp, #8
    1030:	4770      	bx	lr
    1032:	9901      	ldr	r1, [sp, #4]
    1034:	9b01      	ldr	r3, [sp, #4]
    1036:	5c51      	ldrb	r1, [r2, r1]
    1038:	54c1      	strb	r1, [r0, r3]
    103a:	9b01      	ldr	r3, [sp, #4]
    103c:	3301      	adds	r3, #1
    103e:	9301      	str	r3, [sp, #4]
    1040:	e7f0      	b.n	1024 <flexspi_nor_get_config+0x8>
    1042:	bf00      	nop
    1044:	00000008 	.word	0x00000008

Regarding the copy,

*config = default_config; always results in the compiler using memcpy, like you said.

So I am using a byte-by-byte copy in order to avoid the compiler inventing another optimization with 'memcpy'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking on an extra indirection level, something like this (untested, just wrote...):

Screenshot from 2024-11-04 11-47-33
Screenshot from 2024-11-04 11-44-19

That has no copy, no side effects.

mongoose.c Outdated
@@ -5702,7 +5712,7 @@ MG_IRAM static bool flash_erase(struct mg_flexspi_nor_config *config,
#if 0
// standalone erase call
MG_IRAM static bool mg_imxrt_erase(void *addr) {
struct mg_flexspi_nor_config config;
struct mg_flexspi_nor_config *config;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see write

mongoose.c Outdated
MG_IRAM static bool mg_imxrt_write(void *addr, const void *buf, size_t len) {
struct mg_flexspi_nor_config config;
struct mg_flexspi_nor_config *config;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for RT1020, but will fail for RT1060, because there we need to have storage for the ROM function to fill, that's why I declared the struct and a pointer to it.
The struct is allocated in the stack;
for the RT1020, it is not used and freed at the end of the function call
for the RT1060, it is filled by the ROM call and freed at the end of the function call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are right. My bad. Let me change that.

@scaprile scaprile merged commit 192f5e5 into master Nov 5, 2024
64 of 76 checks passed
@scaprile scaprile deleted the rt1020-ota-fix branch November 5, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants