Fine-grained, network-wide visibility is vital to reliably maintaining and troubleshooting high-density, mega-scale modern data center networks to accommodate heterogeneous mission-critical applications. However, traditional management protocols, such as SNMP, fall short of highresolution monitoring for highly dynamic data center networks due to the inefficient controller-driven, per-device polling mechanism. With end host-launched full-mesh pings, Pingmesh is capable of providing the maximum latency measurement coverage. Pingmesh is excellent but still flawed. It cannot extract hop-by-hop latency or look into the queue depth inside switches for in-depth analysis, but, for network applications such as load balancing, failure localization and management automation, these underlying information is increasingly insightful. In-band Network Telemetry (INT), one of the killer applications of P4, allows probe or data packets to query device-internal states, such as queue depth and queuing latency, when they pass through the data plane pipeline, which is considered promising and has been embedded into several venders’ latest merchant silicon. As a chip-level primitive, INT simply defines the interaction between the incoming packets and the device-internal states for monitoring. For network-wide telemetry, further orchestration on top of INT is needed.
There are two design patterns to achieve network-wide measurement coverage based on INT, that is, distributed probing and centralized probing. HULA follows the distributed probing and adopts the ToR switches to flood the probes into data center network’s multi-rooted topology for measurement coverage. Since each probe sender does not have the global view of the network to make any coordination, one link will be repetitively monitored by many probes simultaneously with huge bandwidth overhead. For high-resolution monitoring, the bandwidth waste will get even worse. To overcome this limitation, centralized probing relies on the SDN controller to make optimized probing path planning. For example, INT-path collects the network topology and generates non-overlapped probing paths that cover the entire network with a minimum path number using an Euler trail-based algorithm. INT-path is theoretically perfect but still has deployment flaws. First, it still explicitly relies on bandwidth-occupying probe packets. Besides, it embeds source routing into the probe packet to specify the route the probe takes. This makes the probe header even bloated especially for a longer probing path.
To tackle the above problems, in this work, we propose INT-label, an ultra-lightweight In-band Network-Wide Telemetry architecture. Distinct from previous work, INT-label follows a “probeless” architecture, that is, the INT-label-capable device periodically labels device-internal states onto data packets rather than explicitly introducing probe packets. Specifically, on each outgoing port of the device, the packets will be sampled according to a predefined label interval T and labelled with the instant device-internal states. As a result, INT-label can still achieve network-wide coverage with finegrained telemetry resolution while introducing minor bandwidth overhead. Along the forwarding path consisting of different devices, the same packet will be labelled independently simply according to the local sample decision, that is to say, INT-label is completely stateless without involving any probing path-related dependency. Therefore, there is no need to leverage the SDN controller for conducting centralized path planning.
INT-label is decoupled from the topology, allowing seamless adaptation to link failures. Like INT, INT-label also relies on the programmability of data plane provided by P4 and the in-network labelling is designed to be transparent to the end hosts. The INT information will be extracted and sent to the SDN controller at the last-hop network device for network-wide telemetry data analysis. To avoid telemetry resolution degradation due to potential loss of labelled packets on some unreliable links, we further design a feedback mechanism to adaptively change the label frequency when the controller gets aware of the packet loss by analyzing the telemetry result.
Experiment result contains preliminary experimental results data and figures.
The impact of label interval on coverage rate and bandwidth occupation.
The number of packet carried INT information under different label interval.
The data plane label times under different label interval.
How the relation between label interval and telemetry resolution affects the coverage rate.
The impact of label interval on coverage rate and INT header bandwidth occupation.
The impact of data plane label interval on network-wide coverage rate changes over time.
Different coverage rates under Base A/B and Pro strategies.
Network-wide coverage degradation due to loss of packets under Base A/B and Pro strategies.
Packet loss rate (due to rate limit) under different label/probe intervals (Base A/B vs HULA).
Distribution of label times.
The number of vantage servers required under different scale FatTree topologies.
The bandwidth overhead under different scale FatTree topologies.
The result of label times distribution of Base A/B in section Theoretical Analysis.
The python program for run the results. If you want to get a specific numerical results, you can run 114-128 lines.
Since the results of Base B are too complicated, we will express these in three parts: gens, monmos, and coefs.
The monmos and coef of the E(B). Ezb.gens:
The monmos and coef of the P(ZB=1). Pzb1.gens:
The monmos and coef of the P(ZB=2). Pzb2.gens:
The monmos and coef of the P(ZB=3). Pzb3.gens:
The monmos and coef of the P(ZB=4). Pzb4.gens:
The monmos and coef of the P(ZB=5). Pzb5.gens:
We build an emulation-based network prototype to demonstrate INT-label performance. The hardware configuration is i5-8600k CPU and 32GB memory with Ubuntu 16.04 OS. The prototype is based on Mininet and consists of 1 controller, 4 Spine switches, 4 Leaf switches, 4 ToR switches and 8 servers. The INT_label include five modules:topology, flow_table, p4_source_code, packet, controller and TIME_OUT.
Establish a mininet topology and start the packet send&receive process.
First, compile p4 program. Establish a mininet topology. Here we can control the link bandwidth, delay, maximum queue length, etc. And initialize the database and start the packet send&receive process.
Initialize the OpenFlow Pipeline of each OVS.
Generate the flow table.
Update the flow table.
Include OpenFlow Pipeline.
Include p4 source code, implemented SR-based INT function and data plane labelling function of INT-label.
Include Headers, Metadatas, parser, deparser and checksum calculator.
SR-based INT function and data plane labelling function are implemented in the program.
If you want to switch from Base A to B, change 0 to 100 in line 250 of my_int.p4.
If you want to change the function
The json file that compiled from my_int.p4 by p4c compiler.
For compiling the my_int.p4.
Implement send&receive packet on the server.
Send packet.
Based on SR, Server1 and Server8 send data packet to other servers. Here we can control the traffic rate and forwarding path.
Receive packet and parse it.
Extract the INT information.
Receive packets and parse them using parse.py. And write the latest INT information into the INT database and Aging database (for calculating coverage rate).
Implement controller-driven adaptive labelling function and calculate the coverage rate.
Implement the function of setting int_sampling_flag to 1 for a while.
Restore the int_sampling_flag to 0 when it is necessary.
Calculate the coverage rate.
Read experimental results.
The flow table is used to change the int_sampling_flag, which is modified by the detect1.py and detect2.py.
Store global variable used to control the telemetry resolution.
If you installed the dependencies and configured the database successfully, then you can run the system with commands below:
redis-cli config set notify-keyspace-events KEA
cd controller/
python coverage.py
cd topology/
python clos.py
If you want to switch from Base A to B, change 0 to 100 in line 250 of /INT_label/p4_source_code/my_int.p4.
redis-cli config set notify-keyspace-events KEA
cd controller/
python coverage.py
python detect1.py
python detect2.py
cd topology/
python clos.py
You can change bandwidth, max queue size and background traffic rate in clos.py to test INT-label under different conditions. If you change the topology, you need to modify packet/send/send.py. You can view the results of the experiment through controller/read_redis.py.
We reproduce the code of HULA. Its the role of each file and usage are similar to those of INT-label.