Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Failure to receive large samples #291

Closed
gmartin82 opened this issue Dec 6, 2023 · 13 comments
Closed

[Bug] Failure to receive large samples #291

gmartin82 opened this issue Dec 6, 2023 · 13 comments
Labels
bug Something isn't working

Comments

@gmartin82
Copy link
Contributor

gmartin82 commented Dec 6, 2023

Describe the bug

In client mode, I have encountered an issue with receiving samples >= 32 KB. Samples appear to be published successfully to a Zenoh Router but are not received by a subscribing application. The exact size which triggers the issue seems to vary from application to application.

To reproduce

Start a Zenoh router:

./target/release/zenohd -l tcp/127.0.0.1:7447 --no-multicast-scouting

Start a Zenoh Pico subscriber:

./build/examples/z_sub -e tcp/127.0.0.1:7447

Start a Zenoh Pico publisher with large, randomly generated character data:

rand=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 65536 | head -n 1)
./build/examples/z_pub -e tcp/127.0.0.1:7447 -v $rand

Observe that samples are not received by z_sub.

System info

Platform: Ubuntu 22.04 64-bit
CPU: Intel i7-1260P
Zenoh version: master (eclipse-zenoh/zenoh@9f7a37e)
Zenoh Pico version: master (ae3c237)

@gmartin82 gmartin82 added the bug Something isn't working label Dec 6, 2023
@gmartin82
Copy link
Contributor Author

This may be a compatibility issue with Zenoh master as it works when using Zenoh 0.10.0-rc. Increasing Z_FRAG_MAX_SIZE allows increasingly larger sample sizes to be successfully sent.

@jean-roland
Copy link
Contributor

I confirm, this looks like a regression from Zenoh starting from hop-to-hop compression.

@Mallets
Copy link
Member

Mallets commented Dec 18, 2023

The size of the RX buffer should depend on wether the link is streamed or not. In case it is streamed, we need to account also for the 2 bytes indicating the batch length. The compression PR fixed that behaviour in Zenoh Rust. However, by doing so, it broke fragmentation compatibility with Zenoh-Pico since those 2 bytes are not accounted.

As a temporary fix, I've created this PR: eclipse-zenoh/zenoh#630 .
Nevertheless, I believe Zenoh-Pico should be update to account for those 2 bytes. I think the right place would be here:

case Z_LINK_CAP_FLOW_STREAM:

Some work may be required to overcome the uint16_t limitation:

uint16_t mtu = (zl->_mtu < Z_BATCH_UNICAST_SIZE) ? zl->_mtu : Z_BATCH_UNICAST_SIZE;

@Mallets
Copy link
Member

Mallets commented Dec 18, 2023

I've tested 479c173 s and it works against eclipse-zenoh/zenoh@780ec60.
Once #297 lands, this issue should be solved. In the meanwhile, I'm waiting to merge eclipse-zenoh/zenoh#630.

@Mallets
Copy link
Member

Mallets commented Dec 18, 2023

#297 has been merged. Before closing this issue, @gmartin82 can you please verify on your side that everything is now working as expected?

@gmartin82
Copy link
Contributor Author

I've tried a range of sample sizes from 8 bytes to 512 Mbytes in client mode and everything appears to be working as expected with the master versions of Zenoh and Zenoh Pico.

However, I am still experiencing communication issues between publishers and subscribers in peer mode. After sending some samples successfully communication freezes up. It doesn't happen every time but seems to be happening often.

@Mallets
Copy link
Member

Mallets commented Dec 18, 2023

Peer mode on UDP multicast doesn't have any reliability. So, it's normal in case of fragmentation to loose data. Do you see data going on the network e.g. with Wireshark?

@gmartin82
Copy link
Contributor Author

Yes I can see data flowing on the network with Wireshark.

if I disable reliability Cyclone DDS is exhibiting the same behavior so I think you’re right that this is due to UDP multicast lacking reliability.

@gmartin82
Copy link
Contributor Author

gmartin82 commented Dec 19, 2023

I reran my tests overnight on my faster desktop machine and my results still show some issues. Many of the earlier tests (< 64 KB sample sizes) failed. Later tests (>256 KB) also failed.

Zenoh commit #788172b0360caa38db24ec2d238734a2ebf40d32
Zenoh Pico commit #63168384689bc5846cf9fdb90743824768963fdd

@Mallets
Copy link
Member

Mallets commented Dec 19, 2023

I think something weird in the tests for payload < 64KB.
For tests > 256KB be aware that by default Zenoh-Pico does not accept any message larger thank 300KB. You should modify this parameter to accept larger messages:

#define Z_FRAG_MAX_SIZE 300000

@gmartin82
Copy link
Contributor Author

I think you're right for sample sizes > 256 KB. I hadn't overridden the maximum fragment size.

@jean-roland
Copy link
Contributor

Should we close this issue?

@gmartin82
Copy link
Contributor Author

Closing as it worked as expected once the maximum fragment size was increased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants