Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: tcp: Implement TCP new Reno collision avoidance #60112

Merged

Conversation

ssharks
Copy link
Collaborator

@ssharks ssharks commented Jul 6, 2023

To avoid a TCP connection from collapsing a link, implement a collision avoidance algorithm. Initially TCP new Reno is implemented for its simplicity.

@ssharks
Copy link
Collaborator Author

ssharks commented Jul 6, 2023

A initial attempt to implement collision avoidance for TCP. I'm thinking to put the collision avoidance in a separate file, I think the Linux way using tcp_congestion_ops and registering collision avoidance algorithms is a little overkill, but I can imagine that multiple choices will be available in the future that can be selected through a configure option.

Feel free to comment or take this as a starting point for further development. I'll be limited available the coming weeks, but feedback is very welcome.

@ssharks
Copy link
Collaborator Author

ssharks commented Jul 6, 2023

Attempts to implement: #46352

@ssharks ssharks force-pushed the net/tcp_collision_avoidance branch from 6b81b8d to 647302c Compare July 6, 2023 20:47
@ssharks ssharks force-pushed the net/tcp_collision_avoidance branch 2 times, most recently from 3cd17ea to 0dc74b0 Compare July 19, 2023 07:05
@ssharks
Copy link
Collaborator Author

ssharks commented Jul 19, 2023

@rlubos
During the style check I get:
0dc74b0:45: ERROR:API_DEFINE: do not specify a non-Zephyr API for libc
#45: FILE: subsys/net/ip/tcp.c:66:+#define TCP_CONGESTION_INITIAL_WIN 1
Do you know what I should do in order to prevent this style violation?

@rlubos
Copy link
Contributor

rlubos commented Jul 19, 2023

@ssharks Looks like a checkpatch configuration bug I've seen in other PR as well, a fix has been submitted: #60551 - can you rebase and see if it helps?

@ssharks ssharks force-pushed the net/tcp_collision_avoidance branch from 0dc74b0 to 14f9fd6 Compare July 20, 2023 22:23
@ssharks
Copy link
Collaborator Author

ssharks commented Jul 21, 2023

To avoid a TCP connection from collapsing a link, implement a collision avoidance algorithm. Initially TCP new Reno is implemented for its simplicity.

Other congestion avoidance algorithms like the Linux default CUBIC will perform better on long fat pipes. As the Zephyr TCP stack supports only 64K transmit window, the difference is likely neglectable even for long latencies.

@ssharks ssharks marked this pull request as ready for review July 21, 2023 14:22
@ssharks ssharks requested review from jukkar and removed request for tbursztyka and rlubos July 21, 2023 14:23
@ssharks ssharks requested review from a user and aaillet and removed request for aaillet July 21, 2023 14:23
@ssharks
Copy link
Collaborator Author

ssharks commented Jul 21, 2023

Please have a look at the form factor. Currently I've put it all in tcp.c if we would put it in a seperate file it would be possible to switch between different algorithms compile time. The specific CA algorithm will just fill in the functions.

I see I need some small changes to properly implement the rfc3782

@ssharks ssharks force-pushed the net/tcp_collision_avoidance branch 2 times, most recently from bb85c11 to 5873c12 Compare July 22, 2023 20:26
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very useful addition - thx! Unfortunately don't know enough for a full review. Just one minor suggestion.

subsys/net/ip/tcp.c Outdated Show resolved Hide resolved
subsys/net/ip/tcp.c Outdated Show resolved Hide resolved
subsys/net/ip/tcp.c Outdated Show resolved Hide resolved
subsys/net/ip/tcp.c Show resolved Hide resolved
@@ -456,6 +456,14 @@ config NET_TCP_FAST_RETRANSMIT
In that case a retransmission is triggerd to avoid having to wait for
the retransmit timer to elapse.

config NET_TCP_CONGESTION_AVOIDANCE
bool "Implement a collision avoidance algorithm in TCP"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is collision avoidance also a commonly used term in this regard? I didn't find too many references to be honest. Maybe we should stick to congestion avoidance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this fixed? I see the prompt still says Implement a collision avoidance algorithm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, fix attempt 2


if (conn->ca.state == TCP_NEW_RENO_RAMPUP) {
new_win += conn_mss(conn);
conn->ca.congestion_win += conn_mss(conn);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unneeded? conn->ca.congestion_win is overwritten just below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code has fully been reworked here

subsys/net/ip/tcp.c Outdated Show resolved Hide resolved
@ssharks
Copy link
Collaborator Author

ssharks commented Jul 27, 2023

During testing I bumped into a corner case, which gave me random failing unit tests. A PR for this issue is submitted at:
#60880

@ssharks ssharks force-pushed the net/tcp_collision_avoidance branch from 5873c12 to 4cb1c54 Compare July 27, 2023 15:28
@zephyrbot zephyrbot added the area: Sockets Networking sockets label Jul 27, 2023
@ssharks ssharks requested review from a user and rlubos July 28, 2023 05:37
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now wrt my superficial remark above, still no full review, though, as I don't understand enough of what's going on there conceptually.

Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some nits. Did you have a chance to run any performance test (like zperf) to see what is the impact on the actual throughput? I guess it should be negligible but it would be good to see if that's really the case.

@@ -456,6 +456,14 @@ config NET_TCP_FAST_RETRANSMIT
In that case a retransmission is triggerd to avoid having to wait for
the retransmit timer to elapse.

config NET_TCP_CONGESTION_AVOIDANCE
bool "Implement a collision avoidance algorithm in TCP"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this fixed? I see the prompt still says Implement a collision avoidance algorithm

subsys/net/ip/tcp.c Outdated Show resolved Hide resolved
@ssharks ssharks force-pushed the net/tcp_collision_avoidance branch from 4cb1c54 to 0104b36 Compare July 28, 2023 11:57
@ssharks
Copy link
Collaborator Author

ssharks commented Jul 28, 2023

Looks good, just some nits. Did you have a chance to run any performance test (like zperf) to see what is the impact on the actual throughput? I guess it should be negligible but it would be good to see if that's really the case.

With respect to throughput, the only thing I can see is that the unit test does not get significantly slower. I do not have HW which I can run a zperf on saidly, so cannot test that.

To avoid a TCP connection from collapsing a link, implement a collision
avoidance algorithm. Initially TCP new Reno is implemented for its
simplicity.

Signed-off-by: Sjors Hettinga <s.a.hettinga@gmail.com>
To allow insighed into the correct functioning of the collision avoidance,
log the internal values and function calls.

Signed-off-by: Sjors Hettinga <s.a.hettinga@gmail.com>
@ssharks ssharks force-pushed the net/tcp_collision_avoidance branch from 0104b36 to 4f85ff3 Compare July 28, 2023 13:12
@ssharks
Copy link
Collaborator Author

ssharks commented Jul 28, 2023

Performed a rebase to trigger a rebuild of the CI/CD

@ssharks
Copy link
Collaborator Author

ssharks commented Jul 28, 2023

@jukkar:
During testing, I bumped into an issue regarding line:

zephyr/subsys/net/ip/tcp.c

Lines 2795 to 2797 in 488fd89

(void)k_work_schedule_for_queue(&tcp_work_q,
&conn->send_data_timer,
K_NO_WAIT);

When the transmit window is full, which happens much more often with congestion avoidance, the resend timer is being triggered by new transmissions. This causes a lot of spurious retransmissions.
It looks like you have created this line, do you still remember the purpose of it?

@ssharks
Copy link
Collaborator Author

ssharks commented Jul 31, 2023

@jukkar: During testing, I bumped into an issue regarding line:

zephyr/subsys/net/ip/tcp.c

Lines 2795 to 2797 in 488fd89

(void)k_work_schedule_for_queue(&tcp_work_q,
&conn->send_data_timer,
K_NO_WAIT);

When the transmit window is full, which happens much more often with congestion avoidance, the resend timer is being triggered by new transmissions. This causes a lot of spurious retransmissions. It looks like you have created this line, do you still remember the purpose of it?

A PR to remove this is already available: #60972

@rlubos
Copy link
Contributor

rlubos commented Aug 1, 2023

@ssharks FYI, I've checked with zperf (1s/10s throughput test) and I see no noticable changes in the TCP throughput.

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@carlescufi carlescufi merged commit 8cd5d6f into zephyrproject-rtos:main Aug 4, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants