Skip to content

Latest commit

 

History

History
executable file
·
396 lines (284 loc) · 14.8 KB

lwIP-notes.adoc

File metadata and controls

executable file
·
396 lines (284 loc) · 14.8 KB

Some lwIP Notes

lwIP is used for creation of a network interface of YAPicoprobe to avoid an uncountable number of additional CDC COM ports.

Currently a Segger SysView server is listening on port 19111. But I guess there is more to come.

Pitfalls

lwIP had some traps waiting for me. Despite reading the official Common Pitfalls I took some of them.

MAC

Do not use a random MAC! At least not for the first byte. I was "lucky" choosing one with an odd first byte. Unfortunately bit 0 marks a group address. At least Linux rejects such MAC addresses with a more or less useless error message in the syslog. Took a while until I found out that the MAC address was the culprit for no communication.
Finally I have used 0xfe as the first byte and the remaining five bytes of the MAC address are copied from the last bytes of Picos serial number (which actually is the serial number of the external flash).

OS Mode

really take it serious what they are writing about "raw" APIs and only using the TCPIP thread for any calls to it.
Use tcpip_callback_with_block(<func>, <void-arg>, 0) for non-blocking invocations of some function. This also saves you from creating extra simple threads for communication tasks line in the TineUSB/lwIP glue code. That was my first idea and it took a while until I found out how bad that idea was.
Same was true for the thread(!) which stuffed data for SysView into it’s server: bad idea!
Effect of wrong API handling were random crashes or connection disruptions.

RNDIS/ECM/NCM

To be honest, I’m very confused about the system behaviour of RNDIS/ECM/NCM. Sometimes the host gets "disconnected", because the DNS in /etc/resolv.conf are changed, sometimes not. Sometimes the probe needs dhclient to get the TCP/IP connection, sometimes not. Sometimes the probe has an IPv6 address, sometimes not. And this all just on my Linux host. Interoperation with Windows makes things even worse.
And /etc/network/interfaces generates error messages even when the device has allow-hotplug.

  • RNDIS: this was my former favorite, because it is supported by all relevant OSs. Also throughput seemed to be good. Unfortunately RNDIS seems to manipulate routing in a way that the default route on my Linux wants to go through the probe. Not really what I want…​

Note

RNDIS on Win10 works only, if RNDIS on the probe is the only USB class selected. So it is possible to create a special probe firmware which provides only features like SystemView, but you cannot have a probe which does SystemView and CMSIS-DAP.
This is not a fault of lwIP, it is a bug in the Win10 driver(?).

  • ECM: works good, packets are transferred continuously, throughput also seems to be ok. So this is the way to go.
    Unfortunately there is no driver integrated into Win10, so possible extra trouble appears. Yes…​ extra trouble: cannot find a driver for Win10.

  • NCM: is said to be the best choice. And in fact it is. At least after creation of a ncm_device_simple.c driver which is a stripped down version of ncm_simple.c which revealed as very buggy.
    Now thoughput under Linux and Windows is ok. Operation with SystemView works without glitches, iperf tests sometimes crashes the probe. So consider this driver as beta and work in progress.

Performance

To measure performance, iperf is used (which implies, that OPT_NET_IPERF_SERVER must be set on built). Good command line for measurement:

iperf -c 192.168.14.1 -e -i 1 -l 1024

Testing

Good test cases are the following command lines:

for MSS in 90 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l 1024 -M $MSS; sleep 10; done
for LEN in 40000 16000 1400 1024 800 200 80 22 8 2 1; do for MSS in 90 93 100 150 200 255 256 300 400 500 511 512 600 700 800 900 1000 1100 1200 1300 1400 1450 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l $LEN -M $MSS; sleep 2; done; done

Monitor performance/errors with Wireshark.

Some words about…​

net_glue.c

I’m really trying hard to switch context between lwIP and TinyUSB correctly. This leads to some kind of delayed call chains and also does not make the code neither nice nor very much maintainable.

NCM

TinyUSB NCM driver implementations is more or less buggy, so I’m doing my best implementing a driver on my own.

Current work consists of debug versions of ncm_device and an almost working ncm_device_simple.

Link:

ncm_device_simple.c

ncm_device_simple.c is actually a mixture of ecm_rndis_device.c and ncm_device.c. From ecm_rndis_device.c the structure has been inherited and from ncm_device.c the functionality.
The driver can be considered work in progress, because in conjunction with iperf crashes sometimes happen. But for operation with SystemView quality seems to be good enough.

Warning
There must be a nasty bug in ncm_simple_device. It reveals on startup because startup time with ncm_device_simple is much longer compared to ECM/RNDIS and even ncm_device.

possible bugs in ncm_device.c

This is more or less obsoleted by ncm_device_simple.c. But as a short summary: the original driver is very buggy. Perhaps it is working in certain scenarios, but for sure not together with SystemView.

  • not sure, but perhaps it is best to call all functions within ncm_device in the FreeRTOS context of TinyUSB

  • wNtbOutMaxDatagrams must be set to 1 [2023-06-27]

    • iperf runs then

    • Systemview still has problems

    • wNtbOutMaxDatagrams == 0 generates a lot of retries with iperf

  • I guess that the major problem lies within handle_incoming_datagram() because it changes values on an incoming packet although tud_network_recv_renew() is still handling the old one

  • is there multicore a problem!? (14.7.2023: no!) I have seen retries with multicore even with wNtbOutMaxDatagrams = 1

  • I think it is assumed, that TinyUSB and lwIP are running in the same task (but in my scenario they don’t)

  • if removing debug messages, then the receive path seems to work better, which indicates a race condition somewhere

There is an open issue in the TinyUSB repo for this issue: hathach/tinyusb#2068

TinyUSB Driver API

TinyUSB → Driver

The following API is for calls from TinyUSB to the driver. The calls are all initiated from within the TinyUSB stack. Thus all are done in the context of TinyUSB.

Name Comment

netd_init()

Initialization of the driver on startup. Called several times.

netd_reset(rhport)

Called several times on startup. rhport seems to be zero in all calls.

netd_open(rhport, *itf_desc, max_len)

Connects the USB endpoints. This is called once when the host driver connects with the device.

netd_control_xfer_cb(rhport, stage, *request)

called after netd_open(). Only stage==CONTROL_STAGE_SETUP seems to be of interest.

netd_xfer_cb(rhport, ep_addr, result, xferred_bytes)

Depending on ep_addr the driver is told here, that a

  • packet can be fetched from the stack for further processing within the driver

  • packet transmission can be started

  • notification packet should be transmitted (that’s about communication parameters)

Driver → TinyUSB

The driver has a whole bunch of available API calls. The most important are:

Name Comment

tud_control_status()

Send STATUS (zero length) packet. Called in netd_control_xfer_cb().

tud_control_xfer()

Carry out Data and Status stage of control transfer. Called in netd_control_xfer_cb().

usbd_edpt_busy()

Check whether an endpoint is busy or ready for the next usbd_edpt_xfer().

usbd_edpt_open()

Used during netd_open().

usbd_edpt_xfer()

Submit a USB transfer. For receive operation, the specified buffer must be empty. For transmit operation, the buffer may not be touched, until the corresponding netd_xfer_cb() is received.

usbd_open_edpt_pair()

Used during netd_open().

Glue Logic → Driver

The following API is for call from glue logic to the driver. The glue logic tries hard to issue the calls in the TinyUSB context as well. But this is not guaranteed I’m afraid (other developers).

Name Comment

tud_network_can_xmit(size)

check if the driver buffer allows another datagram with the specified size. If the driver tells the glue logic that there is space enough for the datagram, the glue logic calls in the next step tud_network_xmit().
Not sure how recovery works if there is no space left. So at the moment the glue logic is responsible for retries.

tud_network_link_state_cb(state)

seems to be obsolete. No call found within the stack. So do not implement. PR at TinyUSB pending to remove the call.

tud_network_recv_renew()

Called when the glue logic has the opinion that the driver should check if it can enable receive logic. The function has to check, if the USB channel and receive buffer are available. Another option (for NCM) is, that there are still buffered datagrams which can then be transferred via tud_network_recv_cb().

tud_network_xmit(*ref, arg)

The glue logic requests a datagram transfer into the driver. The driver may then prepare for the actual copy operation from glue logic which is performed via tud_network_xmit_cb(dst, ref, arg). Transmission does not have to take place. E.g. the NCM driver should be capable of buffering multiple datagrams into one big NCM transfer block. The call must succeed.

Driver → Glue Logic

The glue logic also provides some API which has to be used by the driver. The driver always calls the glue logic in the TinyUSB context.

Name Comment

tud_network_recv_cb(*src, size)

Transfer a single datagram from the driver to the glue logic. When the layer above the glue logic (lwIP) has handled the datagram, it issues a tud_network_recv_renew() so the process of datagram reception does not die.

tud_network_xmit_cb(*dst, *ref, arg)

The driver calls this function from tud_network_xmit() to perform the actual copy operation of the datagram from glue logic into the driver. The two parameters are not changed by the driver, except that it specifies an additional copy destination.

The ncm_device_new driver & comparison

The following table holds a comparison between the several network drivers. The first seven bars are created with for MSS in 100 200 400 800 1200 1450 1500; do iperf -c 192.168.14.1 -e -i 1 -M $MSS -l 8192 -P 1; sleep 2; done. After that SystemView is started with almost maximum load (~85000 events/s, 325 KByte/s) and after a while iperf is started again in parallel.

The images are recorded with Wireshark. The red bars are "TCP Window Full" if not otherwise noted.

Driver

ECM
The driver shows expected behavior, nothing actually special.

ECM

RNDIS
Again nothing special.

RNDIS

Original NCM
The original NCM driver is very buggy as said. The red bars in the graph are not caused by "TCP Window Full". Obscure messages in Wireshark show that the protocol is more or less garbage.

Original NCM

Simple NCM
The simple NCM driver behaves much better, but revealed some weaknesses in parallel operation (it also had some overflows in SystemView without iperf in paralell. It had some stability issues and it also had bugs on startup of the probe which was the actual reason to create ncm_device_new.

Simple NCM

New NCM
ncm_device_new clearly shows best behavior. Throughput is best and during parallel operation there was no packet loss when iperf used large packets. Also no obsucre Wireshark messages in parallel operation.

New NCM

So obviously ncm_device_new is the clear winner: best in performance, best in functionality, best in compatibility. What else is needed?

Log

2023-05-12

  • for unknown reasons the probe is even with ECM in stutter mode, don’t know why, that worked before smoothly. Transfer rate is bad

  • systemview test program (NoOS) on the target:

    • it already worked with around 10000 events/s, now the limit is ~3000

    • if there is a SysTick ISR then SystemView is completely messed up. Checked that locking is included. Seems to be so.

2023-06-26

  • after some changes to rtt_console.c, net_sysview.c and net_glue.c ECM is working again as expected

2023-06-30

  • for debugging purposes reimplemented ncm_device_simple.c which can hold only one ethernet frame per NTB (NCM Transfer Block). This unfortunately requires that the original ncm_device.c must be outcommented via #if on top.

2023-07-14

  • did some performance tuning with lwIP and TinyUSB

  • stripped sources

  • BUG: ncm_device_simple sometimes crashes with iperf

2023-08-11

  • BUG: with ncm_device_simple startup time of the probe is much longer compared to ECM/RNDIS or even ncm_device. With startup time I mean the time until there is something visible on the probes debug output. For ECM/RNDIS/ncm_device this is almost instantly, with ncm_device_simple it takes ~10s!
    → reverted to ncm_device because SystemView runs without problems with it
    → solved with ncm_device_new

2023-08-16

  • new driver: ncm_device_new

    • works (better then ncm_device_simple), but

      • ✓ problems, if wNtbOutMaxDatagrams!=1 → see comment

      • ✓ iperf also shows problems if -P is > 1. I guess this is an iperf problem, because iperf and SystemView are running parallel without such errors

      • ❏ surprisingly iperf performance is much better with actual firmware. cmake-create-debugEE has just half performance

      • ✓ but all these problems also exists with ncm_device. Is it in the glue code? Possible, because the effect is also with ECM driver. No, not in the glue code, because iperf and SystemView work in parallel

      • ✓ packets/s is changing heavily, setting wNtbOutMaxDatagrams==0 helped to prevent raising of packet rate (sometimes there are two datagrams in one NTB even with SystemView)

    • how to continue?

      • ✓ need a test case where tud_network_can_xmit() collects datagrams. Currently there is always only one active xmit datagram, perhaps iperf with -P 4 does it.
        → this all happens under load testing

      • ✓ check if there is a problem in the glue code for datagram reception. Glue buffer freed too soon? No, I doubt it. But examples are few.

2023-08-18

  • the new driver is ready. Had some optimization loops, but now it seems to work pretty well