Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA_SignalEvent has undefined behaviour #218

Closed
fhars opened this issue May 25, 2021 · 6 comments
Closed

OTA_SignalEvent has undefined behaviour #218

fhars opened this issue May 25, 2021 · 6 comments

Comments

@fhars
Copy link

fhars commented May 25, 2021

This line:

err = otaAgent.pOtaInterface->os.event.send( NULL, pEventMsg, 0 );

may dereference a null pointer, especially after an OTA_Shutdown(ticks, false). (Note that checking that pOtaInterface is valid before the call is not sufficient here, as the shutdownHandler might memset otaAgent to zero between the check and the call). Another possible race condition is in the called send method, as (in the FreeRTOS case) the queue itself might be deinitialized by the time execution reaches the configAssert( pxQueue ); in xQueueGenericSend, although this error path is currently masked by #217.

@fhars fhars changed the title OTA_SignalEvent has undefiend behaviour OTA_SignalEvent has undefined behaviour May 25, 2021
@fhars
Copy link
Author

fhars commented May 25, 2021

The same thing in OTA_Suspend:

    /* Check if OTA Agent is running. */
    if( otaAgent.state != OtaAgentStateStopped )
    {
        /* Stop the request timer. */
        ( void ) otaAgent.pOtaInterface->os.timer.stopTimer( OtaRequestTimer );

and in OTA_ActivateNewImage (this time with the insufficient check):

    if( ( otaAgent.pOtaInterface != NULL ) && ( otaAgent.pOtaInterface->pal.activate != NULL ) )

@divekarshubham
Copy link
Contributor

Thank you for bringing these issues to our attention. We will work on adding further checks for these functions.

@lightblu
Copy link

lightblu commented Jan 12, 2022

+1

Want to add that I believe this doesn't even need such a hard/tight preemptive race like theorized above, but could even happen with e.g. cooperative FreeRTOS timer race, because a timer may just expire when OTA_Shutdown is called, and the timer callback then stills gets called afterwards.

That was my theory for some time, until I realized OTA_Shutdown does not clear started timers at all.

@archigup
Copy link
Member

Hi, apologies for the delay. OTA_Shutdown has now been updated to clear started timers here: #413.
We are investigating additional fixes needed to resolve this issue.

@lightblu
Copy link

Thanks.
Also want to point out, there is even another simple path how to get this undefined behaviour, which is via the packet ingestion. So imagine you have called OTA_Shutdown, but your Mqtt module is still receiving a late OTA message it then passes on to the OTA library via OTA_SignalEvent..

(One could sure say you shouldn't do pass packets to OTA after you have called the shutdown method, but better would be to also guard this path. It at least needs to be documented, and there I also have to agree with fhars' #224 - it is not only not well abstracted but also not at all documented. "So one has to rifle through the source of the library" also matches my memory of getting that part right.

@ActoryOu
Copy link
Member

PR #440 provides the solution for this. Once the OTA_Init called, the function pointer is set and never reset even it's shutdowned.
So there shouldn't be any NULL function pointer accessed with this patch.

I'm going to close this issue. Any feedback is welcome.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants