All of the experiments require two servers interconnected via a 100-Gbps link. One of the servers (i.e., client
) generates packets, while the other one (i.e., dut
) acts as the device under test. Our testbed configuration is shown in the following table.
Server | Processor | NIC |
---|---|---|
Packet Generator | Intel Xeon Gold 6134 @ 3.2 GHz | Mellanox ConnectX-5 VPI |
DUT | Intel Xeon Gold 6140 @ 2.3 GHz | Mellanox ConnectX-5 VPI |
Our experiments require npf
, dpdk
(on both servers), fastclick
(on both servers), pcm
(on the dut), intel-cmt-cat
(on the dut), and Splash-3
(on the dut). You should setup your testbed based on the following guidelines.
We use NPF tool to run our experiments. However, it is possible to run them manually. To setup NPF, you can run the following commands:
git clone https://github.com/tbarbette/npf.git && cd npf
sudo apt install python3 python3-pip libssl-dev
pip3 install --user -r requirements.txt
You should also define your servers and NICs configuration in NPF. To do so, you can define pkt-gen.node
and dut.node
in the npf/cluster
according to the example node files. For example, one of our servers is defined as follows:
path=/home/alireza/ddio-bench/npf/
addr=server0.kth.se
arch=skylake
nfs=0
0:mac=50:6b:4b:43:88:ca
0:ip=192.168.100.13
0:ifname=enp23s0f0
0:pci=0000:17:00.0
1:mac=50:6b:4b:43:88:cb
1:ip=192.168.101.13
1:ifname=enp23s0f1
1:pci=0000:17:00.1
-
path
is the path to NPF on a given machine. If you do not have Network File System (NFS) in your testbed, you should uncommentnfs=0
. -
addr
is the address of each machine. It can be either its URL or its IP address.
You should define 4 variables for every NIC port:
mac
: The MAC address of the interfaceip
: The IP address of the interfaceifname
: The interface namepci
: The PCIe address of the interface
The first three variables can be set via ifconfig
. To find the PCIe address of an interface, you can run sudo lspci -v | grep Mellanox
. It is important to set these values carefuly, as NPF replaces these values in its scripts.
NPF uses ssh to connect to the servers. Please make sure that you have setup passwordless ssh on your system. If you want to use your current key (e.g., ida_rsa
) on a different server, you can run the following commands:
eval $(ssh-agent -s)
ssh-add ~/.ssh/ida_rsa
You can check NPF README file for more information.
We use DPDK to bypass kernel network stack in order to achieve line rate in our tests. To build DPDK, you can run the following commands. Note that Mellanox drivers (e.g., MLX5
) is not enabled by default and you should enable them in dpdk/config/common_base
if you have Mellanox cards.
git clone https://github.com/DPDK/dpdk.git
cd dpdk
git checkout v20.02
make install T=x86_64-native-linux-gcc
To build DPDK, you can also use dpdk/usertools/dpdk-setup.sh
.
After building DPDK, you have to define RTE_SDK
and RTE_TARGET
. To do so, run:
export RTE_SDK=/home/alireza/ddio-bench/dpdk/
export RTE_TARGET=x86_64-native-linux-gcc
You can try running testpmd to check whether DPDK is working properly on your system or not.
We use Fastclick to generate and process packets. To build Fastclick, you can run the following commands. We have implemented a new element for Fastclick, called DDIOTune, to enable/disable/tune DDIO. To use this element, you have to compile Fastclick with --enable-research
and --enable-dpdk
flags and install libpci-dev
and cpuid
libraries. Note that DDIOTune requires at least gcc 6.0 to be built successfully.
git clone git@github.com:tbarbette/fastclick.git
cd fastclick
sudo apt-get install libpci-dev
./configure RTE_SDK=/home/alireza/ddio-bench/dpdk RTE_TARGET=x86_64-native-linux-gcc --enable-multithread --disable-linuxmodule --enable-intel-cpu --enable-user-multithread --verbose CFLAGS="-std=gnu11 -O3" CXXFLAGS="-std=gnu++14 -O3" --disable-dynamic-linking --enable-poll --enable-dpdk --disable-dpdk-pool --disable-dpdk-packet --enable-research
make
Note that you should change RTE_SDK
and RTE_TARGET
in the configure
command.
We use PCM tool to monitor the Intel performance counters introduced for PCIe. Specifially, we measure the ItoM
and PCIeRdCur
events via pcm/pcm-pcie.x
, which shows the performance of DDIO. You can find more information about these events either in our paper or here.
To build PCM tool, you can run the following commands.
git clone https://github.com/opcm/pcm.git
cd pcm
make
sudo make install
To check your installation, run sudo pcm-pcie
.
We use Intel RDT software package to configure Cache Allocation Technology (CAT), Memory Bandwidth Allocation (MBA), and measure per-core Last Level Cache (LLC) misses via pqos
command. To setup pqos
, you can run the following commands.
git clone https://github.com/intel/intel-cmt-cat.git
cd intel-cmt-cat
make
sudo make install
To check your installation, run sudo pqos
. You can find more information about CAT and pqos in intel-cmt-cat wiki.
Below, you can find some useful hints for optimizing your testbed.
To have a more accurate measurement, we recommend you to isolate one CPU socket. By doing so, you ensure that operating system is not polluting your cache. To do so, you can add the following commands to the GRUB
commandline in /etc/default/grub
. To find the cores located on a CPU socket, you can use lscpu
command.
GRUB_CMDLINE_LINUX="isolcpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17"
There are more options for the GRUB_CMDLINE
to optimize your system. We used the followings:
GRUB_CMDLINE_LINUX="isolcpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17 iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=128 acpi=on selinux=0 audit=0 nosoftlockup processor.max_cstate=1 intel_idle.max_cstate=0 intel_pstate=on nopti nospec_store_bypass_disable nospectre_v2 nospectre_v1 nospec l1tf=off netcfg/do_not_use_netplan=true mitigations=off"
Note that you need to run sudo update-grub
and reboot your system after changing /etc/default/grub
.
It is essential to check the processor's frequency before running any experiment so that you could reproduce them later.
-
Checking: To check the processor's frequency you can use one of the following methods:
lscpu | grep "CPU" | grep "MHz"
cat /proc/cpuinfo | grep "processor\|MHz"
sudo lshw -c cpu | grep "*-cpu\|MHz"
sudo dmidecode -t processor | grep Speed
sudo watch -n 1 cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
The last method prints the frequency of each core every second. Note that instead of using
cpu*
, you can print the frequency of a subset of cores (e.g.,cpu[0-7]
). -
Changing: To change the frequency of the processors, you can use
cpupower
tool, as follows:sudo cpupower frequency-set -u 2.3G -d 2.3G
-u
specifies the maximum frequency, while-d
shows the minimum frequency for each core. To have a fixed frequency, you can set both of them to a single frequency (e.g.,2.3G
).sudo cpupower frequency-info | grep "hardware limits"
orlscpu | grep "CPU" | grep "MHz"
print the minimum/maximum frequencies for the processor.- You can get the nominal frequency of the processor via:
cat /proc/cpuinfo | grep model | grep -o "[0-9.]\+G" | head -n 1
For more information, check here.
We disable PAUSE frames in our experiments, which will be done automatically by Fastclick. However, you can also enable/disable/check the status of PAUSE frames for every interface via ethtool
.
ethtool -a enp23s0f0
sudo ethtool -A enp23s0f0 rx off tx off
It is possible to measure the number of PAUSE frames via ethtool
and dpdk
.
-
ethtool:
sudo ethtool -S enp23s0f0 | grep pause_ctrl
. For more info, check here. -
dpdk: You can read ethtool statistics in DPDK via
xstats
API. However, some of the Mellanox counters (e.g.,rx_pause_ctrl
andtx_pause_ctrl
) are not available by default. To measure PAUSE frames via dpdk, you should add the counters tostatic const struct mlx5_counter_ctrl mlx5_counters_init[]
located indpdk/drivers/net/mlx5/mlx5_stats.c
, as follows:
static const struct mlx5_counter_ctrl mlx5_counters_init[] = {
{
.dpdk_name = "rx_port_unicast_bytes",
.ctr_name = "rx_vport_unicast_bytes",
},
...
+ {
+ .dpdk_name = "rx_pause_ctrl_phy",
+ .ctr_name = "rx_pause_ctrl_phy",
+ },
+ {
+ .dpdk_name = "tx_pause_ctrl_phy",
+ .ctr_name = "tx_pause_ctrl_phy",
+ },
};
We set CQE_COMPRESSION
to aggressive mode in order to save PCIe bandwidth in our experiments. To check the status of CQE_COMPRESSION
, you can run the following commands.
sudo mst start
sudo mlxconfig -d /dev/mst/mt4119_pciconf0 q | grep CQE
To change its status, you can run sudo mlxconfig -d /dev/mst/mt4119_pciconf0 set CQE_COMPRESSION=1
.
Note that you need to install Mellanox OFED
and MFT
before being able to run the mentioned commands. For more info, check the Mellanox manual for your NIC.