ENH: Add POC async implementation, example using storescp #542

naterichman · 2024-07-14T15:36:09Z

POC async implementation.

Created a feature flag tokio which controls the importing of read_pdu and write_pdu from either [read|write].rs or [read_nonblocking|write_nonblocking].rs
Created feature gated implementations of the methods in Client and Server which depend on using read_pdu/write_pdu or any read/write to the socket themselves.

A couple other notes:

We could consider using the maybe_async crate.
- Since most of the code in read_pdu/write_pdu is copied for the sync/async implementation I thought this could be nice to use, but I found that I'd have to extend the macro to adjust the trait bounds for the sync/async implementation and I couldn't figure out the best way to do that
I had to change a little bit of the internal API for write_chunk_u16 and write_chunk_u32
For now I completely ignored the PDataReader and PDataWriter since I'm not exactly clear where they should be used or not used (storescp doesn't use those at all, and in my implementation of c-find I didn't end up using either at all either)
I just added the timeouts in another MR, but it seems like the concept of socket level timeouts doesn't exist in an async context? So Future work will be adding the timeout to the individual methods like send, receive, establish, etc. via something like tokio::time::timeout

Also note this is meant to just be one possible implementation, and I totally expect completely rewriting this MR however you want it to look! Thanks in advance!

naterichman · 2024-07-14T21:18:59Z

I've been doing some more reading up, and unfortunately it seems like the options are

Repeat a bunch of code either manually or with a nice macro to write truly sync and async versions of ul, but that would (and the current PR) break users expecting cargo install dicom-ul --all-features to work.
Make the code async only, and expose a blocking client which just calls block_on for the blocking client

Enet4 · 2024-07-21T10:04:51Z

Much appreciated!

I've been doing some more reading up, and unfortunately it seems like the options are

Repeat a bunch of code either manually or with a nice macro to write truly sync and async versions of ul, but that would (and the current PR) break users expecting cargo install dicom-ul --all-features to work.

Make the code async only, and expose a blocking client which just calls block_on for the blocking client

There ought to be a bit of redundancy in this process unfortunately. Whether to go async or not requires a different function "color" which affects its inner workings altogether.

At best, there is a way to reduce the amount of redundancy, which is to write non-blocking and non-polling implementations of the PDU readers. While this can be a bit trickier than the existing logic at read_pdu, this would couple nicely with the concept of framing without requiring users to depend on tokio if they only intend to use the blocking API.

Let me know if more guidance is needed here.

naterichman · 2024-07-23T15:57:12Z

Let me make sure I'm understanding where you're going. For the reader implementation, it would be something like:

Rename read_pdu to something like parse_pdu and leave the trait bound as R: Read, keeping the logic essentially as is but changing the return type to something like Result<Option<Pdu>> instead of Result<Pdu>
Making an async receive and leaving the existing receive method on the ClientAssociation, changing the logic to something like:

pub async fn receive(&mut self) -> Result<Pdu> {
    loop {

        if let Some(pdu) = parse_pdu(&mut self.buffer)? {
            return Ok(Some(pdu));
        }

        if 0 == self.stream.read_buf(&mut self.buffer).await? {
            if self.buffer.is_empty() {
                return Ok(None);
            } else {
                return Err("connection reset by peer".into());
            }
        }
    }
}

And similar for pub fn receive? (Mostly copy/pasting code from the framing article you linked 😄 )

Enet4 · 2024-07-23T17:21:19Z

Let me make sure I'm understanding where you're going. For the reader implementation, it would be something like:

Rename read_pdu to something like parse_pdu and leave the trait bound as R: Read, keeping the logic essentially as is but changing the return type to something like Result<Option<Pdu>> instead of Result<Pdu>

That could work if we use a reader that keeps a buffer of all partial data, so that it can be read multiple times until a complete frame is available. In practice, we may be better off making this function receive a value with a better trait bound than Read, providing a peekable source of bytes which can be extended as more data arrives from the network, and only consumes the front bytes when individual frames comprising those bytes have been fully processed. The bytes crate (part of the tokio ecosystem) offers some nice data types and traits for this, so it may be worth trying it out.

naterichman · 2024-07-23T21:57:36Z

I'm not really following, sorry! What kind of trait bound might be better, something like Buf or BufMut? At that point why not just make read_pdu take &mut BytesMut or Cursor<&[u8]>. Any chance you could provide a little code snippet of something you had in mind so I know which direction to go?

Enet4 · 2024-07-24T09:28:32Z

What kind of trait bound might be better, something like Buf or BufMut? At that point why not just make read_pdu take &mut BytesMut or Cursor<&[u8]>.

Both BytesMut and Cursor<&[u8]> implement Buf, but they assume a contiguous portion of bytes. The way I see it, the trait would make room for memory usage optimizations by allowing us to read from chained portions of data (thinking network packet payloads).

Any chance you could provide a little code snippet of something you had in mind so I know which direction to go?

I would just try and see if you can tweak the implementation so that the function signature is fn read_pdu(x: impl Buf) -> Result<Option<Pdu>>. BufMut would offer write access, which we do not need for reading.

Change is working with async storescp, still need to try how it would work with the sync version

naterichman

Called out a few specific changes. I have it working again as POC for the storescp command, still need to figure out how the sync client would look. LMK your thoughts/if this is the direction you had in mind

ul/src/association/client.rs

ul/src/pdu/reader.rs

ul/src/association/client.rs

* Finish exposing various methods of client/server as feature-gated async * Finish async PDataWriter (still having issues)

naterichman · 2024-08-07T15:19:27Z

Okay, I'm happier with the implementation now. I have a POC fully working for storescp, but I'm having trouble getting the AsyncPDataWriter working for use with storescu. Currently I get the association working fine, but the send_pdata section is not working.

Orthanc logs:

======================= END A-ASSOCIATE-AC ======================
T0807 10:13:53.602828          DICOM-1 CommandDispatcher.cpp:760] (dicom) Received Command:
===================== INCOMING DIMSE MESSAGE ====================
Message Type                  : C-STORE RQ
Presentation Context ID       : 1
Message ID                    : 1
Affected SOP Class UID        : MRImageStorage
Affected SOP Instance UID     : 1.3.6.1.4.1.14519.5.2.1.8421.4004.102660487069874494712993403337
Data Set                      : present
Priority                      : medium
======================= END DIMSE MESSAGE =======================
I0807 10:13:53.602850          DICOM-1 main.cpp:353] Incoming Store request from AET STORE-SCU on IP 127.0.0.1, calling AET ANY-SCP
E0807 10:13:53.603049          DICOM-1 StoreScp.cpp:273] Store SCP Failed: DIMSE Failed to receive message

Wireshark capture:

Can attach the actual capture file if needed

I'm hoping you could

Go over existing implementation of read_pdu and write_pdu and changes to the interface of the server/client from a high level so I know if I'm approaching it right.
Similar for high level review of how I've introduced the async functionality into storescu and storescp
Help me with the AsyncPDataWriter issue hopefully

After that, I'd like to start writing tests, I just didn't want to write them yet if the interface is going to change.

And then I'd like to do some benchmarking of the async vs sync code and also the new sync code (with the framing) vs. the old sync code.

LMK your thoughts! Thanks again!

Enet4

Thank you for continuing with this PR. There is a lot to go through, and it is not clear from the code what the problem with PData reading/writing could be. I can try testing this against other platforms. Until then, I left some feedback inline for things which should be taken care of.

storescp/Cargo.toml

Cargo.toml

encoding/src/text.rs

storescu/Cargo.toml

storescu/src/main.rs

ul/src/association/client.rs

ul/src/association/pdata.rs

* Remove unneeded trait bounds on TextCodec * Cleanup some imports * `context` errors instead of unwrapping

naterichman · 2024-10-16T15:18:05Z

Regarding the storescu to storescp. I was not able to reproduce that, I also added real concurrency to storescu so now there is a -c flag which represents how many tasks to spin up for sending a lot of files. I tried that a few times and had no issues.

Additionally, I changed the options for storescp so that it is no longer async by default.

Where are those files you were testing on? I'd like to see if I can reproduce using that file, or maybe its because you are on windows and there are some minor differences (I'm on linux).

I also updated the documentation in ul!

Enet4 · 2024-10-16T17:09:18Z

Where are those files you were testing on? I'd like to see if I can reproduce using that file, or maybe its because you are on windows and there are some minor differences (I'm on linux).

The file I used was MG1_JPLY from the WG04 test file set, though I'm not sure if the problem was specific to this file. It can also be downloaded from dicom-test-files.

I was using Linux this time, but I can try again on both machines in any case and get back to you. :)

Enet4

I tried the same test again and this time it worked! Aside from the comments inline, all that should be left to do here is fix the tests.

storescp/src/main.rs

storescu/src/main.rs

Co-authored-by: Eduardo Pinho <enet4mikeenet@gmail.com>

- so that the tests build and pass without the "async" feature

- add dedicated heading for presentation contexts

Enet4

I took the liberty of making some corrections around feature-gating on "async" (this was explained in one of the comments inline). I only have a suggestion for one of the doctests which I would like you to validate, then we're very likely ready to merge.

ul/src/association/server.rs

.github/workflows/rust.yml

Enet4 · 2024-10-18T15:21:19Z

I would have greatly wished to merge this, but right now the transfer goes wrong whenever the concurrency option in dicom-storescu is set, even when I set it to 1. For instance:

cargo run --bin dicom-storescu -- -c 1 ANYSCP@localhost:1111 .../dicom-test-files/data/WG04/JPLL

Our store SCP starts reporting errors of this kind at random:

024-10-18T13:50:03.021555Z  INFO dicom_storescp::store_async: New association from STORE-SCU
2024-10-18T13:50:03.031017Z  INFO dicom_storescp::store_async: Stored storage\1.3.6.1.4.1.5962.1.1.20.1.4.20040826185059.5457.dcm
2024-10-18T13:50:03.060996Z ERROR dicom_storescp: failed to read DICOM data object

Caused by these errors (recent errors listed first):
  1: Could not read data set token
  2: Could not read item header
  3: Could not decode element header at position 1311814
  4: Failed to read the item header
  5: failed to fill whole buffer

And often the files won't end up saved to disk (I had to set concurrency to a number greater than 1 and keep trying to find valid files in storage).

Also tested with another store SCP (Dicoogle), it also reports errors and returns status code 101h.

This could constitute a problem with PDU writing in async mode, so withholding concurrency might just be hiding the problem. I'm afraid this will stay blocked until we find a fix.

naterichman · 2024-10-19T12:12:13Z

I was worried that it was a random/intermittent thing when you said it failed at first but when you ran it again it passed. I will try to reproduce and look into it!

Enet4

I was looking into this right now! Still haven't gotten to the root of the problem, but encountered a few other things in the code worth looking into.

Enet4 · 2024-10-19T12:38:14Z

storescu/src/store_async.rs

+
+            {
+                let mut pdata = scu.send_pdata(pc_selected.id).await;
+                pdata.write_all(&object_data).await.unwrap();


Should probably turn this into a recoverable error.

Enet4 · 2024-10-19T12:39:55Z

storescu/src/main.rs

+        if let Err(e) = result {
+            error!("{}", Report::from_error(e));
+            if fail_first {
+                std::process::exit(-2);
            }


This code is testing the error on task join, but not the application error returned by the task. This worked better on my machine.

Suggested change

if let Err(e) = result {

error!("{}", Report::from_error(e));

if fail_first {

std::process::exit(-2);

}

match result {

Err(e) => {

error!("{}", Report::from_error(e));

if fail_first {

std::process::exit(-2);

}

}

Ok(Err(e)) => {

error!("{}", Report::from_error(e));

if fail_first {

std::process::exit(-2);

}

}

Ok(Ok(_)) => {}

}

naterichman · 2024-10-19T13:46:06Z

Will get to those too. Don't worry about figuring out the sendscu issue, its something within the messed up logic of self.writing in the implementation of AsyncPDataWriter.poll_write. I've had a hell of a time figuring out how to handle the underlying stream write call returining Poll::Pending, so I just need to figure out the correct logic for handling that, but it will get there!

* Make AsyncPDataWriter a proper state machine * Remove use of `<stream>.write_all` which already loops over input data and removes some of our control, switch to manual use of `poll_write` on underlying stream

naterichman · 2024-10-21T14:22:23Z

Okay I believe this is fixed. It only ever came up on sufficiently large files and I believe it had to do with how the tokio TCPStream write method works in that a lot of data can be written immediately, but with a large enough amount to transfer, it can sometimes return Pending instead of Ready. This explains why I had no issues sending a lot of smaller files and then was able to reproduce by trying to send a large US dicom.

I'll be honest feel much better about this implementation too since the original was mostly guided by chatGPT... This implementation I understand much better, and I left a decent amount of inline comments explaining it for future reference

Enet4

OK, I did a few more tests and found no more issues. Terrific work!

ENH: Add POC async implementation, example using storescp

3251368

naterichman mentioned this pull request Jul 15, 2024

[Draft] ENH: Add feature-gated async runtime for storescp #463

Closed

MAIN: Modify read_pdu to take in Buf trait.

3140822

Change is working with async storescp, still need to try how it would work with the sync version

naterichman commented Jul 25, 2024

View reviewed changes

ul/src/association/client.rs Outdated Show resolved Hide resolved

ul/src/pdu/reader.rs Outdated Show resolved Hide resolved

ul/src/association/client.rs Outdated Show resolved Hide resolved

naterichman added 3 commits August 5, 2024 09:44

MAIN: Implement PDataReader and PDataWriter with async

b8d8200

MAIN: Finish implementing framing for reading

1f79816

MAIN: Add POC async version of storescp and storescu

b37bea4

* Finish exposing various methods of client/server as feature-gated async * Finish async PDataWriter (still having issues)

Enet4 added A-lib Area: library A-tool Area: tooling C-ul Crate: dicom-ul C-storescu Crate: dicom-storescu C-storescp Crate: dicom-storescp labels Aug 7, 2024

Enet4 reviewed Aug 7, 2024

View reviewed changes

naterichman added 10 commits August 7, 2024 11:09

MAIN: Cleaning up

63e0736

* Remove unneeded trait bounds on TextCodec * Cleanup some imports * `context` errors instead of unwrapping

MAIN: Add implementation for AsyncRead to PDataReader

d53c0f2

MAIN: Enumerate specific needed features

876aac9

MAIN: Resolve unused imports and format code

dfd0fad

MAIN: Fix implementation of poll_read and poll_write

5334e2d

MAIN: Formatting and fixing compilation warnings

250578c

MAIN: Simplify implementation of poll_read

b863798

MAIN: Review comments and implement async versions of tests

fdccf13

MAIN: Add github workflow for async feature flag

b73bb0d

MAIN: Enumerate needed tokio features

113b7f7

naterichman added 4 commits October 16, 2024 10:13

MAIN: Make storescp blocking by default and change flag name

ff97b48

MAIN: fmt

db6854d

MAIN: Update deps

1b964fc

MAIN: Remove uneeded dep

92a42a9

Enet4 reviewed Oct 16, 2024

View reviewed changes

storescp/src/main.rs Outdated Show resolved Hide resolved

storescu/src/main.rs Show resolved Hide resolved

naterichman and others added 8 commits October 16, 2024 14:35

MAIN: Apply review comment on concurrency help

6146d4c

Co-authored-by: Eduardo Pinho <enet4mikeenet@gmail.com>

MAIN: Fix doctests

ffae792

MAIN: Add help for non-blocking

20416ad

MAIN: Box large error variants

f01b396

MAIN: fmt

bed7a3b

[ul] Add more "async" feature gates on test code

c2b1a61

- so that the tests build and pass without the "async" feature

[ul] Tweak association::client documentation

26aac77

- add dedicated heading for presentation contexts

[ul] Remove async as a default feature, making it opt-in

06fe10d

Enet4 reviewed Oct 17, 2024

View reviewed changes

ul/src/association/server.rs Outdated Show resolved Hide resolved

.github/workflows/rust.yml Show resolved Hide resolved

Enet4 linked an issue Oct 17, 2024 that may be closed by this pull request

Starting the discussion on async support #476

Closed

MAIN: Add more concrete example for handling pdu

5929c2d

Enet4 reviewed Oct 19, 2024

View reviewed changes

naterichman added 2 commits October 21, 2024 09:11

MAIN: Fix poll_write implementation

521ec2a

* Make AsyncPDataWriter a proper state machine * Remove use of `<stream>.write_all` which already loops over input data and removes some of our control, switch to manual use of `poll_write` on underlying stream

MAIN: Clean up

f8f8cb3

Enet4 approved these changes Oct 23, 2024

View reviewed changes

Enet4 merged commit 389cd8a into Enet4:master Oct 23, 2024
4 checks passed

Enet4 mentioned this pull request Oct 29, 2024

[ul] Recover API compatibility with 0.7.1 #582

Merged

Enet4 added the breaking change Hint that this may require a major version bump on release label Nov 1, 2024

Enet4 mentioned this pull request Nov 2, 2024

Stream loss in association establishment when association acceptor sends more PDUs in quick succession #589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add POC async implementation, example using storescp #542

ENH: Add POC async implementation, example using storescp #542

naterichman commented Jul 14, 2024

naterichman commented Jul 14, 2024

Enet4 commented Jul 21, 2024

naterichman commented Jul 23, 2024

Enet4 commented Jul 23, 2024

naterichman commented Jul 23, 2024

Enet4 commented Jul 24, 2024

naterichman left a comment

naterichman commented Aug 7, 2024

Enet4 left a comment

naterichman commented Oct 16, 2024

Enet4 commented Oct 16, 2024

Enet4 left a comment

Enet4 left a comment

Enet4 commented Oct 18, 2024 •

edited

Loading

naterichman commented Oct 19, 2024

Enet4 left a comment

Enet4 Oct 19, 2024

Enet4 Oct 19, 2024

naterichman commented Oct 19, 2024

naterichman commented Oct 21, 2024

Enet4 left a comment

ENH: Add POC async implementation, example using storescp #542

ENH: Add POC async implementation, example using storescp #542

Conversation

naterichman commented Jul 14, 2024

naterichman commented Jul 14, 2024

Enet4 commented Jul 21, 2024

naterichman commented Jul 23, 2024

Enet4 commented Jul 23, 2024

naterichman commented Jul 23, 2024

Enet4 commented Jul 24, 2024

naterichman left a comment

Choose a reason for hiding this comment

naterichman commented Aug 7, 2024

Enet4 left a comment

Choose a reason for hiding this comment

naterichman commented Oct 16, 2024

Enet4 commented Oct 16, 2024

Enet4 left a comment

Choose a reason for hiding this comment

Enet4 left a comment

Choose a reason for hiding this comment

Enet4 commented Oct 18, 2024 • edited Loading

naterichman commented Oct 19, 2024

Enet4 left a comment

Choose a reason for hiding this comment

Enet4 Oct 19, 2024

Choose a reason for hiding this comment

Enet4 Oct 19, 2024

Choose a reason for hiding this comment

naterichman commented Oct 19, 2024

naterichman commented Oct 21, 2024

Enet4 left a comment

Choose a reason for hiding this comment

Enet4 commented Oct 18, 2024 •

edited

Loading