Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] eBPF offload consideration #2

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Conversation

levaitamas
Copy link
Member

@levaitamas levaitamas commented May 15, 2023

Hi,

Me and @rg0now have been investigating on boosting pion/turn performance with eBPF. As a first step, we implemented an eBPF/XDP offload for UDP channel bindings. This way, pion/turn can offload the channel data processing to the kernel. Below we present our implementation details, early results and call for a discussion to consider eBPF offload in pion/turn.

Implementation details

How does it work?

The XDP offload handles ChannelData messages only. The userspace TURN server is responsible for all the other functionality from building channels to handle requests and etc. The offload mechanisms are activated after a successful channel binding, in the method Allocation.AddChannelBind. The userspace TURN server sends peer and client info (5-tuples and channel id) to the XDP program via an eBPF map. From that point the XDP program can detect channel data coming from the peer or from the client. When a channel binding gets removed the corresponding data will be deleted from the eBPF maps and thus there will be no offload for that channel.

Changes to pion/turn

New: We introduce a new internal offload package, which manages offload mechanisms. Currently, there are two implementations: the XDPOffload that uses XDP, and a NullOffload for testing purposes.

Changed: The kernel offload complicates lifecycle management since eBPF/XDP offload outlives TURN server objects. This calls for new public methods in package turn to manage the offload engine's lifetime: InitOffload starts the offload engine (e.g., loads the XDP program and creates eBPF maps) and ShutdownOffload removes the offload engine. Note that these methods should be called by the application as shown in the server_test.go benchmark.

But after everything is set up, channel binding offload management happens in Allocation.AddChannelBind and Allocation.DeleteChannelBind with no change in their usage.

eBPF/XDP details

The XDP part consist of a program that describes the packet processing logic to be executed when the network interface receives a packet. The XDP program uses eBPF maps to communicate with the user space TURN server.

Maps: The XDP offload uses the following maps to keep track of connections, store statistics, and to aid traffic redirects between interfaces:

name key value function
turn_server_downstream_map peer 5-tuple client 5-tuple + channel-id match peer -> client traffic
turn_server_upstream_map client 5-tuple + channel-id peer 5-tuple match client -> peer traffic
turn_server_stats_map 5-tuple + channel id stats (#pkts, #bytes) traffic statistics per connection (5-tuple and channel-id)
turn_server_interface_ip_addresses_map interface index IPv4 address interface IP addresses for redirects

XDP Program: The XDP program receives all packets as they arrive to the network interface. It filters IPv4/UDP packets (caveat: VLAN and other tunneling options are not supported), and checks whether the packets belong to any channel binding (i.e., checks the 5-tuple and channel-id). If there is a match, the program does the ChannelData handling: updates 5-tuple, adds or removes the ChannelData header, keeps track of statistics, and finally redirects the packet to the corresponding network interface. Other non channel data packets are passed to the network stack for further processing (e.g., channel refresh messages and other STUN/TURN traffic goes to user space TURN server).

Results

CPU profiling

Prior results are promising. The CPU profiling with the benchmark (pion#298) shows that the server.ReadLoop() that took 47.9 sec before, runs for 0.96 sec with the XDP offload.

Flame graph w/o the offload:
No_offload

Flame graph w/ XDP offload:
XDP_offload

Microbenchmark with simple-server

Measurements with iperf, turncat (our in-house TURN proxy), and the simple-server example show outstanding (150x!) delay reduction and significant (6x) bandwidth boost.

Measurement setup

Delay results

avg[ms] simple multi xdp
avg 3.944 4.311 0.033
min 3.760 0.473 0.023
median 3.914 4.571 0.027
max 4.184 5.419 0.074

Bandwidth results

Note
iperf stalls at ~220k pps, we assume 1+ mpps with a powerful load generator

[pps] simple multi xdp
avg 36493 96152 227378
min 35241 91856 222567
median 36617 96843 227783
max 37545 99455 233559

Discussion

  • XDP offload is straightforward for UDP connections, but is cumbersome for TCP and TLS. Fortunately, the eBPF ecosystem provides other options: tc and sockmap can be potential alternatives with a reasonable complexity-performance trade-off.
    • Yet we need to coordinate the different offload mechanisms for different connections.
    • In addition, offload mechanisms introduce new lifecycle management scale: these mechanisms overlive TURN server objects.
  • The eBPF objects needs to be built and distributed, and this makes the build process more complex.
    • New dependency: cilium/ebpf.
    • Build process gets more complex: eBPF objs are built via go generate; how to integrate it with the current build process; e.g., add a Makefile?
  • Monitoring is not trivial due to the lifetime of XDP objects and becasue in XDP conections are identified by 5-tuples and we loose the notion of 'listeners'.
    • Therefore, current monitoring implementation is initial. The bytes and pkts sent via a 5-tuple are stored in a statistics eBPF map. We update the counters in the statistics map, but we do not delete from it. There is no interface exposed for querying statistics (one can use bpftool to dump the map content)
  • XDP Limitations: The bpf_redirect() that handles packet redirects in eBPF/XDP supports redirects to NIC egress queues in XDP. This prevents supporting scenarios when clients exchange traffic in a server-local 'loop'.
    • We had issues with forwarding traffic between NICs with the xdp driver to NICs with the xdpgeneric drivers (except the lo interface). We disabled the XDP offload for host-local redirects.
    • A packet size limit is set in the XDP program to prevent fragmentation (currently set to 1480bytes)

@levaitamas levaitamas requested a review from rg0now May 15, 2023 13:02
@rg0now
Copy link
Member

rg0now commented May 16, 2023

This is going to be almost perfect, thanks! One tiny request: can you please add a short summary of the implementation? The maps we create, when you're do we manage map entries, etc. Also, one note on how everything is handled in the user space TURN server except ChannelData packets and also the reverse path.

I'll try to go through the code later but don't wait for me just feel free to file the PR any time you see fit.

@levaitamas
Copy link
Member Author

Thanks @rg0now ! Made some edit in the OP. WDYT?

@levaitamas levaitamas force-pushed the server-ebpf-offload branch 2 times, most recently from 37c028c to c8ba98d Compare May 19, 2023 15:32
@@ -0,0 +1,50 @@
FROM golang:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question: why aren't we doing multistage? It was easier this way or there's some fundamental reason why this can't be done with ebpf?

client := offload.NewConnection(a.fiveTuple.SrcAddr.(*net.UDPAddr), a.fiveTuple.DstAddr.(*net.UDPAddr), uint32(c.Number))
err := offload.Engine.Upsert(peer, client, []string{})
if err != nil {
offload.Engine.Logger().Errorf("failed to init offload between peer: %+v and client: %+v due to: %s", peer, client, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pion log format requires all log messages to start with a capital letter...:-( Steffen always complains about my PRs due to this same reason...:-) also, please avoid #v in log messages: use proper stringification instead and carefully format all log messages


// disable offload
if offload.Engine != nil {
// TODO: use FiveTuple + int channelid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still meaningful?

// disable offload
if offload.Engine != nil {
// TODO: use FiveTuple + int channelid
peer := offload.NewConnection(cAddr.(*net.UDPAddr), a.RelayAddr.(*net.UDPAddr), uint32(number))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make NewConnection take general net.Addrs instead of UDP addresses?

internal/allocation/allocation.go Show resolved Hide resolved
return err
}

o.log.Debugf("XDP: remove offload between peer: %+v and client: %+v", p, c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove %v everywhere, make this into a proper stringified error, it is extremenely important that we can trace this info in our log messages! Also, please report the stats on shutdown (transmitted: x pkgs, y bytes)

return err
}

o.log.Debugf("XDP: create offload between peer: %+v and client: %+v", p, c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove %v and properly report which 5-tuple was shortcut to which 5-tuple for which protocol and channel id.

}

// GetStat queries statistics about an offloaded connection
func (o *XdpEngine) GetStat(con Connection) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us refer to an offload with a handle. I think it is a terrible public API to make the user always mess with 5-tuples to ask for stats: let the Getstat signature be:

func (o *XdpEngine) GetStat(handle int) (Stat, error) {...}

where Stat should be our stats. Do not print stats, return them to the user!

}

// Remove removes an XDP offload between a peer and a client
func (o *XdpEngine) Remove(peer, client Connection) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make Remove taking an offload handle instead of 5-tuples

internal/offload/xdp/headers/LICENSE.BSD-2-Clause Outdated Show resolved Hide resolved
@rg0now
Copy link
Member

rg0now commented May 20, 2023

I'm not sure about the UDP checksum issue: what's the status now? We should not try to upstream the code until we find out how to generate correct checksums

@levaitamas levaitamas changed the title [RFC] eBPF offload cosideration [RFC] eBPF offload consideration Jul 6, 2023
@levaitamas levaitamas force-pushed the server-ebpf-offload branch 2 times, most recently from d46cb08 to 856dc91 Compare October 9, 2023 12:29
@levaitamas levaitamas force-pushed the server-ebpf-offload branch 2 times, most recently from 3d0fc0e to 09b48ae Compare November 16, 2023 11:10
@levaitamas levaitamas force-pushed the server-ebpf-offload branch 2 times, most recently from 4ac0f62 to 5304857 Compare November 24, 2023 12:33
@levaitamas levaitamas force-pushed the server-ebpf-offload branch 2 times, most recently from b59fd8f to 07bfae9 Compare December 1, 2023 14:26
@levaitamas levaitamas force-pushed the server-ebpf-offload branch 3 times, most recently from 120127b to 4b776f2 Compare June 17, 2024 21:19
@levaitamas levaitamas force-pushed the server-ebpf-offload branch 2 times, most recently from 4b776f2 to c5bdc92 Compare July 5, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants