Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMP notifications corrupted on apollo3 #161

Open
simonratner opened this issue Jun 1, 2022 · 3 comments
Open

SMP notifications corrupted on apollo3 #161

simonratner opened this issue Jun 1, 2022 · 3 comments

Comments

@simonratner
Copy link

simonratner commented Jun 1, 2022

SMP notifications used to deliver SMP responses over BLE transport are corrupted on the apollo3_evb.

Below is a sample log from nRF Connect app attempting an image list command (as part of the DFU process). You can see the response being fragmented over two notifications, with the second notification corrupted. As a result, the concatenated CBOR is not decodable. The response does not need to be corrupted - the issue exists even for single-packet responses.

This is reproducible with stock bleprph on apollo3_evb.
This is NOT reproducible with the same app on nordic_pca10056.

Error can be observed with the nRF Connect app or the nRF Device Manager app for Android/iOS (CBOR Error).

V	17:20:50.573	Connecting to 66:77:88:23:BB:EF...
D	17:20:50.573	gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, preferred PHY = LE 1M)
D	17:20:56.145	[Callback] Connection state changed with status: 133 and new state: DISCONNECTED (0)
E	17:20:56.150	Error 133 (0x85): GATT ERROR
I	17:20:56.150	Disconnected
D	17:20:58.707	gatt.close()
D	17:20:58.709	wait(200)
V	17:20:58.910	Connecting to 66:77:88:23:BB:EF...
D	17:20:58.910	gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, preferred PHY = LE 1M)
D	17:21:02.715	[Callback] Connection state changed with status: 0 and new state: CONNECTED (2)
I	17:21:02.715	Connected to 66:77:88:23:BB:EF
V	17:21:02.715	Requesting new MTU...
D	17:21:02.715	gatt.requestMtu(517)
I	17:21:03.159	Connection parameters updated (interval: 7.5ms, latency: 0, timeout: 5000ms)
I	17:21:03.391	MTU changed to: 256
V	17:21:03.398	Discovering services...
D	17:21:03.398	gatt.discoverServices()
D	17:21:03.404	[Callback] Services discovered with status: 0
I	17:21:03.404	Services discovered
V	17:21:03.406	Generic Access (0x1800)
- Device Name [R] (0x2A00)
- Appearance [R] (0x2A01)
Device Information (0x180A)
- Model Number String [R] (0x2A24)
- Serial Number String [R] (0x2A25)
- Firmware Revision String [R] (0x2A26)
- Manufacturer Name String [R] (0x2A29)
SMP Service (8d53dc1d-1db7-4cd3-868b-8a527460aa84)
- SMP Characteristic [N WNR] (da2e7828-fbce-4e01-ae9e-261174997c48)
   Client Characteristic Configuration (0x2902)
I	17:21:03.474	Connection parameters updated (interval: 45.0ms, latency: 0, timeout: 5000ms)
V	17:21:08.902	[McuMgr] Connecting...
D	17:21:08.905	[McuMgr] gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, LE 1M)
D	17:21:08.953	[McuMgr] [Callback] Connection state changed with status: 0 and new state: 2 (CONNECTED)
I	17:21:08.958	[McuMgr] Connected to 66:77:88:23:BB:EF
D	17:21:08.962	[McuMgr] wait(300)
I	17:21:08.966	[McuMgr] MTU changed to: 256
V	17:21:09.266	[McuMgr] Discovering services...
D	17:21:09.271	[McuMgr] gatt.discoverServices()
I	17:21:09.276	[McuMgr] Services discovered
V	17:21:09.286	[McuMgr] Primary service found
V	17:21:09.290	[McuMgr] Requesting new MTU...
D	17:21:09.292	[McuMgr] gatt.requestMtu(498)
I	17:21:09.370	[McuMgr] MTU changed to: 256
D	17:21:09.374	[McuMgr] gatt.setCharacteristicNotification(da2e7828-fbce-4e01-ae9e-261174997c48, true)
V	17:21:09.378	[McuMgr] Enabling notifications for da2e7828-fbce-4e01-ae9e-261174997c48
D	17:21:09.381	[McuMgr] gatt.writeDescriptor(00002902-0000-1000-8000-00805f9b34fb, value=0x01-00)
I	17:21:09.459	[McuMgr] Data written to descr. 00002902-0000-1000-8000-00805f9b34fb, value: (0x) 01-00
I	17:21:09.464	[McuMgr] Notifications enabled
A	17:21:09.470	[McuMgr] Sending (10 bytes) Header (Op: 0, Flags: 0, Len: 2, Group: 1, Seq: 0, Command: 0) CBOR {}
V	17:21:09.480	[McuMgr] Writing characteristic da2e7828-fbce-4e01-ae9e-261174997c48 (WRITE COMMAND)
D	17:21:09.484	[McuMgr] gatt.writeCharacteristic(da2e7828-fbce-4e01-ae9e-261174997c48)
I	17:21:09.489	[McuMgr] Data written to da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 00-00-00-02-00-01-00-00-BF-FF
I	17:21:09.648	[McuMgr] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 01-00-00-FC-00-01-00-00-BF-66-69-6D-61-67-65-73-9F-BF-64-73-6C-6F-74-00-67-76-65-72-73-69-6F-6E-69-31-2E-30-2E-30-2E-34-37-39-64-68-61-73-68-58-20-B2-27-75-77-F3-C1-C3-BF-E5-AA-26-64-7C-DF-77-AD-26-27-C8-EC-58-E3-C5-DC-8D-3B-97-B6-51-9E-73-BC-68-62-6F-6F-74-61-62-6C-65-F5-67-70-65-6E-64-69-6E-67-F4-69-63-6F-6E-66-69-72-6D-65-64-F5-66-61-63-74-69-76-65-F5-69-70-65-72-6D-61-6E-65-6E-74-F4-FF-BF-64-73-6C-6F-74-01-67-76-65-72-73-69-6F-6E-69-31-2E-30-2E-30-2E-34-37-39-64-68-61-73-68-58-20-9F-46-47-02-D7-E9-BA-FD-57-69-BF-73-42-88-86-85-A0-1B-66-C1-B7-84-48-0A-E3-E8-F8-B5-12-95-80-F7-68-62-6F-6F-74-61-62-6C-65-F5-67-70-65-6E-64-69-6E-67-F4-69-63-6F-6E-66-69-72-6D-65-64-F4-66-61-63-74-69-76-65-F4-69-70-65-72-6D-61-6E-65-6E-74-F4-FF-FF-6B-73-70-6C-69-74-53
I	17:21:10.226	[McuMgr] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 74-53-00-FC-00-01-00
V	17:21:10.246	[McuMgr] Disconnecting...
D	17:21:10.253	[McuMgr] gatt.disconnect()
D	17:21:10.289	[McuMgr] [Callback] Connection state changed with status: 0 and new state: 0 (DISCONNECTED)
I	17:21:10.294	[McuMgr] Disconnected
D	17:21:10.299	[McuMgr] gatt.close()

The expected second notification (at t=17:21:10.226) should be:

74-61-74-75-73-00-FF
@simonratner
Copy link
Author

cc @t3zeng

@t3zeng
Copy link

t3zeng commented Jun 1, 2022

Hey guys, you can reproduce this by running the bleprph app and then try to perform an image read.

Repro steps:

  1. Load bleprph app onto apollo3 evb and boot up
  2. On your phone, go to nRF Connect Device Manager and look for the evb
  3. Go to the Image tab at the bottom and then go to the top right and press Advanced
  4. A new Images section will pop up. Press read
  5. At this point there will be BLE comms with the apollo3 evb but the image data will fail to be retrieved

What I observe is that in /apache-mynewt-core/mgmt/smp/transport/ble/src/smp_ble.c, if you print out the contents of om in smp_ble_out that gets put onto the mqueue, it will not match the contents when you get the mbuff back in smp_ble_event_data_in even though the address of the mbuf is the same.

I set a watchpoint on the address of the corrupted mbuf and was able to get the following backtrace:

#0  smp_tx_rsp (ns=<optimized out>, rsp=0x100025a4 <os_msys_1_data+112>, arg=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:226
#1  0x0001c418 in smp_process_request_packet (streamer=streamer@entry=0x10002f80 <g_smp_ble_transport>, req=0x10004408 <pool_acl_buf>)
    at repos/apache-mynewt-mcumgr/smp/src/smp.c:365
#2  0x00016e50 in smp_process_packet (st=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:265
#3  0x00016e78 in smp_event_data_in (ev=<optimized out>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:293
#4  0x00015526 in os_eventq_run (evq=<optimized out>) at repos/apache-mynewt-core/kernel/os/src/os_eventq.c:196
#5  0x00014aea in main () at apps/bleprph/src/main.c:356

What seems to happen is that in this line of code the mbuf returned is the same address as the one used to store the notify data so the mbuf gets corrupted. This behavior does not seem to be present on the nrf52840dk pca10056

@t3zeng
Copy link

t3zeng commented Jun 2, 2022

Adding to the above, I set up some checks and increased the size of the task stack to 8192 words
OS_CTX_SW_STACK_CHECK: 1
OS_MEMPOOL_CHECK: 1
OS_MEMPOOL_GUARD: 1
OS_MEMPOOL_POISON: 1
The issue persists despite the stack usage is very far off from maxing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants