Skip to content

Latest commit

 

History

History
121 lines (79 loc) · 4.79 KB

packetloss_explained.md

File metadata and controls

121 lines (79 loc) · 4.79 KB

Packet loss (almost) explained

I like telling UDP jokes because I do care if you don't get them.

One of the main issues when using data diodes is not being able to check if the UDP packets you send arrived at the receiving proxy. During testing using large files we found out that one of the main issues is that the receiving application is not able to read the UDP packets quick enough from the rx_queue. In this overview we try to explain this.

Please add an issue for this article if you can help because we still haven't found a final solution for this problem.

Overview packet loss

In this overview we use 5 steps to explain the issue:

  1. Sender application sends/receive the data
  2. The application sends the data to the TX_QUEUE in kernel space
  3. The UDP packets are send over the cable trough the data diode to the receiver
  4. The receiver receives the UDP packets and places them in the RX_QUEUE
  5. The receiver application reads the UDP packets from the RX_QUEUE, processes them and writes the packets to the file

We found out that most of the packet loss happens on the receiving machine when the application is not able to read the RX_QUEUE fast enough. When the RX_QUEUE is full the kernel drops the packets and data is lost.

Possible solutions

  1. Send the data slower from the sender. The applications UDPcast or PV (pipe viewer) are able to send the data on a lower speed.
  2. Enlarge the RX_QUEUE using sysctl. This only helps for small bursts, when sending multiple Gb's this has no use.
  3. Use FEC (Forward Error Control). The application UDPcast has the option to add FEC.
  4. Faster processing at the receiving application. Try to reduce the CPU load of the receiving application and/or limit other applications using the CPU
  5. Faster writing to disk. Use SSD or memdisk to write the data.

Monitoring packet loss

Monitor UDP queues and packets dropped. We are still looking into the best way to monitor the queues and this can be done on several ways.
Monitor both the sender and receiver.

cat /proc/net/udp

or use netstat

sender
sudo netstat -c -udp -an | grep "9001"
recording

receiver
sudo netstat -c -udp -an | grep ".255:9000"
recording

or use ss

watch -n 1 "ss -u -a -p -t '( sport = :9000 )'"

or use traffic control (tc).

tc -s qdisc show dev eth0

Bottom line, we noticed that the Recv-Q fills and the UDP-receiver application stops/crashed. Recv-Q is the count of bytes not copied by the user program connected to this socket.

Except for increasing the sysctl net.core.rmem_max we are still looking for a solution how to prevent the Recv-Q dropping packets. Please comment if you can help.

sudo sysctl -w net.core.rmem_max=32777216

Monitoring the softnet_stat we saw no increase of queues.

cat /proc/net/softnet_stat

Monitoring using Dropwatch

https://www.cyberciti.biz/faq/linux-show-dropped-packets-per-interface-command/

sudo apt-get install libpcap-dev libnl-3-dev libnl-genl-3-dev binutils-dev libreadline6-dev autoconf libtool pkg-config build-essential```

git clone https://github.com/nhorman/dropwatch
cd dropwatch
./autogen.sh
./configure
make
sudo make install

check the current settings

cat /proc/sys/kernel/kptr_restrict
sudo su
echo 0 > /proc/sys/kernel/kptr_restrict

dropwatch -l list

sudo dropwatch -l kas

start

To do 1

CPU - NIC Isolation: #10 First step in exploration..

To do 2

Good story about tuning TCP. Perhaps lowering the garbage collection could help to reduce packet loss on the receiving side.

https://blog.cloudflare.com/the-story-of-one-latency-spike/

https://unix.stackexchange.com/questions/611981/udp-packet-drops

Helpful links

https://arthurchiao.github.io/blog/monitoring-network-stack/

https://www.sobyte.net/post/2022-05/linux-udp-packet-drop-debug/

https://iopscience.iop.org/article/10.1088/1748-0221/15/09/T09005/pdf
B System configuration
The following commands were used (performed as superuser) to change the system parameters on CentOS. The examples below modifies network interface eno49. This should be changed to match the name of the interface on the actual system.

sysctl -w net.core.rmem_max=12582912
sysctl -w net.core.wmem_max=12582912
sysctl -w net.core.netdev_max_backlog=5000
ifconfig eno49 mtu 9000 txqueuelen 10000 up

More details about UDP tuning:
https://gilbertasm.github.io/2018/09/13/tunning-udp-buffers.html
https://blog.packagecloud.io/monitoring-tuning-linux-networking-stack-sending-data/#monitoring-udp-protocol-layer-statistics