Replies: 2 comments 1 reply
-
Please give iperf-3.16 a try (or, for people reading this in the future, some appropriate possibly-later version). It supports multi-threading, which allows iperf3 to use multiple CPU cores (for multiple |
Beta Was this translation helpful? Give feedback.
-
At 100G speeds, its very important to specify the same cores everything time on both client and server, as different cores will have different performance. An easy way to do this is to use numactl, or the iperf3 -A flag. |
Beta Was this translation helpful? Give feedback.
-
Hi Everyone
Throughput on a 100G interface is being tested, and the results show that Iperf3 v3.9 reports lower network throughput than Iperf 2.1.5.Here is the benchmark results with Iperf3 /2
Are there any more CPU performance tweaks that I'm missing? Please recommend.
#iperf3 --version
iperf 3.9 (cJSON 1.7.13)
iperf -v
iperf version 2.1.5 (3 December 2021) pthreads
Context
Version of iperf3:
Hardware:
lspci | grep -i Mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
03:00.1 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
Operating system (and distribution, if any):
cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Bug Report
[SUM] 0.0000-10.0083 sec 92.5 GBytes 87.2 Gbits/sec
[SUM] 0.0000-10.0012 sec 72.1 GBytes 61.9 Gbits/sec
I have tune cpu affinity using this url https://fasterdata.es.net/host-tuning/40g-tuning/.
Step-1
allow TCP with buffers up to 2GB
net.core.rmem_max = 2147483647
net.core.wmem_max = 2147483647
increase Linux autotuning TCP buffer limit to 1GB
net.ipv4.tcp_rmem = 4096 65536 1073741824
net.ipv4.tcp_wmem = 4096 65536 1073741824
Step-2
TO CHANGE THE CPU FREQ TO PERFORMANCE:
sudo cpupower frequency-set -g performance
Step: 3
NIC tuning: Increase the ring buffer size to the max (8192), and confirm that interrupt coalescence is ON.
/usr/sbin/ethtool -G ethN rx 8192 tx 8192
ethtool -g ens2f1np1
Ring parameters for ens2f1np1:
Pre-set maximums:
RX: 8192
RX Mini: n/a
RX Jumbo: n/a
TX: 8192
Current hardware settings:
RX: 8192
RX Mini: n/a
RX Jumbo: n/a
TX: 8192
#/usr/sbin/ethtool -C ethN adaptive-rx on adaptive-tx on
#ethtool -c ens2f1np1
Coalesce parameters for ens2f1np1:
Adaptive RX: on TX: on
stats-block-usecs: n/a
sample-interval: n/a
pkt-rate-low: n/a
pkt-rate-high: n/a
rx-usecs: 32
rx-frames: 64
rx-usecs-irq: n/a
rx-frames-irq: n/a
tx-usecs: 8
tx-frames: 128
tx-usecs-irq: n/a
tx-frames-irq: n/a
rx-usecs-low: n/a
rx-frame-low: n/a
tx-usecs-low: n/a
tx-frame-low: n/a
rx-usecs-high: n/a
rx-frame-high: n/a
tx-usecs-high: n/a
tx-frame-high: n/a
CQE mode RX: on TX: off
/usr/sbin/ethtool -C ethN adaptive-rx on adaptive-tx on
step:4
IOMMU (Input–Output Memory Management) Settings
iommu support configured in grub.
cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.15.0-100-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro amd_iommu=on iommu=pt
I have tune cpu affinity using this url https://fasterdata.es.net/host-tuning/40g-tuning/.
Even after performing the above tunning I'm getting below throughput:
#iperf -c 10.14.44.5 -P 4
[ 2] local 10.14.44.4 port 50240 connected with 10.14.44.5 port 5001
[ 3] local 10.14.44.4 port 50248 connected with 10.14.44.5 port 5001
[ 4] local 10.14.44.4 port 50260 connected with 10.14.44.5 port 5001
Client connecting to 10.14.44.5, TCP port 5001
TCP window size: 85.0 KByte (default)
[ 1] local 10.14.44.4 port 50224 connected with 10.14.44.5 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0058 sec 18.9 GBytes 16.2 Gbits/sec
[ 2] 0.0000-10.0058 sec 18.2 GBytes 15.6 Gbits/sec
[ 4] 0.0000-10.0058 sec 19.0 GBytes 16.3 Gbits/sec
[ 1] 0.0000-10.0219 sec 16.0 GBytes 13.7 Gbits/sec
[SUM] 0.0000-10.0012 sec 72.1 GBytes 61.9 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) = 0.325/0.385/0.433/0.045 ms (tot/err) = 4/0
#iperf3 -c 10.14.44.5 -t 10 -b 100G
Connecting to host 10.14.44.5, port 5201
[ 5] local 10.14.44.4 port 44744 connected to 10.14.44.5 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.43 GBytes 20.9 Gbits/sec 513 8.21 MBytes
[ 5] 1.00-2.00 sec 2.09 GBytes 18.0 Gbits/sec 0 8.21 MBytes
[ 5] 2.00-3.00 sec 2.31 GBytes 19.9 Gbits/sec 0 9.49 MBytes
[ 5] 3.00-4.00 sec 1.20 GBytes 10.3 Gbits/sec 0 9.49 MBytes
[ 5] 4.00-5.00 sec 2.06 GBytes 17.7 Gbits/sec 0 9.49 MBytes
[ 5] 5.00-6.00 sec 2.16 GBytes 18.6 Gbits/sec 0 9.49 MBytes
[ 5] 6.00-7.00 sec 2.17 GBytes 18.6 Gbits/sec 0 9.49 MBytes
[ 5] 7.00-8.00 sec 2.17 GBytes 18.6 Gbits/sec 0 9.49 MBytes
[ 5] 8.00-9.00 sec 2.16 GBytes 18.6 Gbits/sec 0 9.49 MBytes
[ 5] 9.00-10.00 sec 2.17 GBytes 18.6 Gbits/sec 0 9.49 MBytes
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 20.9 GBytes 18.0 Gbits/sec 513 sender
[ 5] 0.00-10.04 sec 20.9 GBytes 17.9 Gbits/sec receiver
Enhancement Request
Current behavior
#iperf -s
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
[ 1] local 10.14.44.5 port 5001 connected with 10.14.44.4 port 44300
[ 2] local 10.14.44.5 port 5001 connected with 10.14.44.4 port 44318
[ 3] local 10.14.44.5 port 5001 connected with 10.14.44.4 port 44302
[ 4] local 10.14.44.5 port 5001 connected with 10.14.44.4 port 44290
[ ID] Interval Transfer Bandwidth
[ 2] 0.0000-10.0029 sec 16.6 GBytes 14.3 Gbits/sec
[ 3] 0.0000-10.0025 sec 18.6 GBytes 16.0 Gbits/sec
[ 1] 0.0000-10.0078 sec 18.4 GBytes 15.8 Gbits/sec
[ 4] 0.0000-10.0030 sec 18.8 GBytes 16.1 Gbits/sec
[SUM] 0.0000-10.0083 sec 72.5 GBytes 62.2 Gbits/sec
Desired behavior
[SUM] 0.0000-10.0083 sec 92.5 GBytes 87.2 Gbits/sec
Implementation notes
Beta Was this translation helpful? Give feedback.
All reactions