-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for SKIP-RX-COPY using MSG_TRUNC and Zero-copy using SO_ZEROCOPY/MSG_ZEROCOPY #1690
Conversation
Thanks David, this looks very promissing. Follows my results on a 100G back to back test environment: Server command: sudo taskset --cpu-list 1 ./src/iperf3 -s -i 1 sudo taskset --cpu-list 1 ./src/iperf3 -c 192.168.1.18 -i 1 -t 30 -V --skip-rx-copy sudo taskset --cpu-list 1 ./src/iperf3 -c 192.168.1.18 -i 1 -t 30 -V --skip-rx-copy --zerocopy=z sudo taskset --cpu-list 1 ./src/iperf3 -c 192.168.1.18 -i 1 -t 30 -V --skip-rx-copy -Z |
This is fantastic! Thanks! I get 2x throughput on a 100G path. Question: Shouldn't --skip-rx-copy be a server side option instead of a client side? (or both if using the -R option) |
The client send this option (as several of the other options) to the server, so setting it for a test is applicable for both normal and reverse ( |
Ah, that makes sense. Thanks. I do see a use case where I might want to force it on the server side, but passing the option from the client is more useful. |
I now understand why you asked to have the option on the server side, a use
case that I didn't think about. I am not sure it is better to try adding
such enhancement to this PR, or wait first that this (or other similar) PR
will me merged and then suggest the enhancement by additional PR.
UPDATE: for now, I will not add the `skip-rx-copy` to the server options, unless the reviewers will recommend differently. This is because such setting will have different behavior for the server as a receiver and for the client as a receiver.
…On Fri, May 3, 2024 at 8:10 PM Brian Tierney ***@***.***> wrote:
Ah, that makes sense. Thanks. I do see a use case where I might want to
force it on the server side, but passing the option from the client is more
useful.
—
Reply to this email directly, view it on GitHub
<#1690 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOSCPP2NT3L7K4JHPEKHOU3ZAPAILAVCNFSM6AAAAABG4FECFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJTGQZTEOBYGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks for the pull request! We're gonna need to study this a bit. |
Having read through the patch (haven't run anything yet) and associated Linux documentation:
https://man7.org/linux/man-pages/man2/recv.2.html |
src/iperf_api.c
Outdated
// sero copy for TCP use sendfile() | ||
if (test->zerocopy && test->protocol->id != Pudp && !has_sendfile()) { | ||
i_errno = IENOSENDFILE; | ||
return -1; | ||
} | ||
#else | ||
// sero copy is supported only by TCP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// sero copy for TCP use sendfile() | |
if (test->zerocopy && test->protocol->id != Pudp && !has_sendfile()) { | |
i_errno = IENOSENDFILE; | |
return -1; | |
} | |
#else | |
// sero copy is supported only by TCP | |
// zero copy for TCP use sendfile() | |
if (test->zerocopy && test->protocol->id != Pudp && !has_sendfile()) { | |
i_errno = IENOSENDFILE; | |
return -1; | |
} | |
#else | |
// zero copy is supported only by TCP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed (with other changes - see comments)
Why "things set-up in a way that I would expect only TCP throughput to be improved"? Although practically that may be the case, what is wrong in the implementation regarding UDP? (Both UDP and TCP use
Forgot to take |
If I understand the Linux kernel documentation correctly, With TCP it will discard With UDP it will discard everything after This means that we only have to read the first part of UDP packet to get the UDP stats the server sticks in there: --- a/src/iperf_udp.c
+++ b/src/iperf_udp.c
@@ -69,6 +69,9 @@ iperf_udp_recv(struct iperf_stream *sp)
sock_opt = 0;
#endif /* HAVE_MSG_TRUNC */
+ if (sock_opt)
+ size = sizeof(sec) + sizeof(usec) + sizeof(pcount);
+
r = Nrecv(sp->socket, sp->buffer, size, Pudp, sock_opt);
/* --- a/src/net.c
+++ b/src/net.c
@@ -397,8 +397,8 @@ Nread(int fd, char *buf, size_t count, int prot)
int
Nrecv(int fd, char *buf, size_t count, int prot, int sock_opt)
{
- register ssize_t r;
- register size_t nleft = count;
+ register ssize_t r, total=0;
+ register ssize_t nleft = count;
struct iperf_time ftimeout = { 0, 0 };
fd_set rfdset;
@@ -441,6 +441,7 @@ Nrecv(int fd, char *buf, size_t count, int prot, int sock_opt)
} else if (r == 0)
break;
+ total += r;
nleft -= r;
buf += r;
@@ -477,7 +478,7 @@ Nrecv(int fd, char *buf, size_t count, int prot, int sock_opt)
}
}
}
- return count - nleft;
+ return total;
}
Doing a quick loopback test on my system shows about a 20% increase in throughput for large UDP packets. On a quick tangent, it looks like with this change UDP tests are pretty much dominated by the overhead of select. |
I wasn't aware of this in my original comment but since |
95ee7db
to
abf359a
Compare
Thanks a lot! I completely missed that point. I now added you suggested change to The new commit is with rebase, and I forgot to mention previously that the changes include the PR #1708 fix, to reduce server's CPU overhead.
As you probably understood, I somehow overlooked the UDP dynamic prefix of a packet ... For the receiving side you solved the issue with the above. For the sending side it seems that the only solution is using the MSG_ZEROCOPY Notifications, but I don't want to add this complexity at this point. It seems that initially it is better to just not implement zero copy for UDP. Before doing that, do you agree? Do you have other suggestion? |
I agree that The math for count = 16;
nleft = count;
...
nleft -= 65507; // nleft = (-65491)
...
return count - nleft; // 16 - (-65491) = 65507 (which does account for all the bits lol) |
The changes to support SKIP-RX-COPY where moved to PR #1717, and this PR will be used only for the support of zero-copy using SO_ZEROCOPY/MSG_ZEROCOPY.
I am not sure I realized that
Will add support for the notifications in this PR. |
@MattCatz Hi MattCatz, what were you using for the "flame graphs" for performance? Thanks! |
I didn't write down the exact steps but probably something like:
If that doesn't work then feel free to reach out over email (you can find it in the git log). |
@MattCatz Thanks for the instructions! I'll give it a shot! |
Version of iperf3 (or development branch, such as
master
or3.1-STABLE
) to which this pull request applies:master
Issues fixed (if any): feature request: support for MSG_ZEROCOPY and MSG_TRUNC flags #1678
Brief description of code changes (suitable for use as a commit message):
Add support for SKIP-RX-COPY (using MSG_TRUNC) and SO_ZEROCOPY/MSG_ZEROCOPY. Although it is not clear that all added functionality improve performance and throughput, support for all these options and their combinations was added, to allow testing all of them. The assumptions is that different environments may have different levels of support for the different options and their combinations.
(Note that running
bootstrap.sh; configure
is required beforemake
to support the new features.)The added options are:
--skip-rx-copy
: when used, for both TCP and UDP,recv(..., MSG_TRUNC)
is used instead ofread()
.MSG_ZEROCOPY
. When used, socket optionSO_ZEROCOPY
is set andsend(...., MSG_ZEROCOPY)
is used instead ofwrite()
.MSG_ZEROCOPY
is used in the following cases:2.1 UDP: when
-Z/--zerocopy
option is set.2.2 TCP: when
--zerocopy=z
is set. Otherwise,sendfile()
continue to be used for TCP zero copy.