[toimage]: bulk conversion working #397

PierreBou91 · 2023-07-23T12:38:58Z

As per #343 I have implemented the possibility to convert multiple dicoms to a folder of images.

For consistency, I have copied a lot of the logic from the dicom-storescu but I don't think it's worth extracting it yet.

Here are my questions/concerns about this commit and I need some guidance:

1. Not tested on multiframe dicom:
I am very unfamiliar with multiframe dicoms and I don't really know where to find or create proper ones. I was not able to test this commit with multiframe dicoms.
The basic functionality should be preserved but even though, I'm not certain that this is the expected behavior (convert the nth frame of every multiframe dicom in the input).
This is by far my biggest concern for this commit because there might be regressions and I don't know of an acceptable way to handle this since the use case (multiframe to image) is unclear to me.

2. Error enum with only Error::Other variant:
Initially I copied the Error enum from dicom-storescu required in the check_file function return type that I also imported. But since Error::Other is the only variant, maybe should it just be removed altogether ?

3. output flag was replaced by out-dir:
This has several consequences:

The possibility to rename the output file for a single conversion is not retained.
If several files have the same name in the set to be converted they will be overridden except for the last one being converted.

I just went to the simplest implementation that matched the specs in #343. If the mentioned consequences are not acceptable, I can try and find some ways to fix them.

4. The format flag currently takes a String:
It works as is but since the image crate is exposing an image::ImageFormat enum, this can be a good base for clap's checking and suggestions.
I refrained from doing it because I didn't know if every image format should be allowed or any other restriction I don't know about.

Overall this commit is far from perfect but I still opened this PR since I already find it more convenient than DCMTK's equivalent binary.

Enet4 · 2023-07-23T12:52:49Z

Thank you for working on this. As I suspect that we might need some time to review and I was about to bring forward a new release of DICOM-rs, I propose that this is worked on after 0.6.0 is out. This should not stop me from publishing this enhancement in a subsequent minor release anyway.

PierreBou91 · 2023-07-23T13:05:08Z

Oh yeah sure, I think you're right ! I'll check in back here when the 0.6.0 is out.

Enet4 · 2023-07-28T18:22:02Z

1. Not tested on multiframe dicom: I am very unfamiliar with multiframe dicoms and I don't really know where to find or create proper ones. I was not able to test this commit with multiframe dicoms. The basic functionality should be preserved but even though, I'm not certain that this is the expected behavior (convert the nth frame of every multiframe dicom in the input). This is by far my biggest concern for this commit because there might be regressions and I don't know of an acceptable way to handle this since the use case (multiframe to image) is unclear to me.

DICOM WG4 has a data set with some example files, and the pydicom test files also have some files which are multi-frame. Running cargo test on the project already downloads some of these via dicom-test-files (see for example pydicom/color3d_jpeg_baseline.dcm).

In practice, I can't be sure which behavior is more suitable, but we can either a) pick the first frame only, or b) print all frames into separate files with an extension to the file name, e.g. abc.dcm to abc_0.png, abc_1.png, abc_2.png,...

2. Error enum with only Error::Other variant: Initially I copied the Error enum from dicom-storescu required in the check_file function return type that I also imported. But since Error::Other is the only variant, maybe should it just be removed altogether ?

Generally, we would go in the opposite direction from here: given this Error type, we can slowly add more specific error variants and replace context calls accordingly.

3. output flag was replaced by out-dir: This has several consequences:

The possibility to rename the output file for a single conversion is not retained.

If several files have the same name in the set to be converted they will be overridden except for the last one being converted.

I just went to the simplest implementation that matched the specs in #343. If the mentioned consequences are not acceptable, I can try and find some ways to fix them.

I admit that this was not made clear in the original issue, but I would prefer that the functionality to convert a single image is retained. It is OK if the code ends up with a bit of redundancy. --out-dir and -o should be marked as conflicting with each other, so it's either one or the other.

4. The format flag currently takes a String: It works as is but since the image crate is exposing an image::ImageFormat enum, this can be a good base for clap's checking and suggestions. I refrained from doing it because I didn't know if every image format should be allowed or any other restriction I don't know about.

That should be fine. At least adding support for extra image formats is as simple as updating image and its feature set.

PierreBou91 · 2023-07-30T09:35:41Z

Thanks for your guidance @Enet4 !

I made good progress on this PR, but I have an issue and I'm not sure what is the best way to handle it:

To select the image format, it's only necessary to set the extension of the PathBuf to the target extension.

So far, I took the original file name and set the extension but I realized there is an issue with filenames that don't include an extension and include a "." which is unfortunately pretty common.
In such cases, methods like file_stem or set_extension don't work correctly since they consider the last portion of the filename (after the last ".") to be an extension so it get lost.

For example given a file f with a filename of "1.2.276.0.123":

calling set_extension("png") on f would return "1.2.276.0.png" (losing the last part)
calling file_stem() on f would return "1.2.276.0" (also losing the last part)

Below is a code snippet to illustrate:

use std::{ffi::OsStr, path::PathBuf};

fn main() {
    let mut path = PathBuf::from("1.2.276.0.123");
    path.set_extension("png");
    assert_eq!(path, PathBuf::from("1.2.276.0.png"));

    let path = PathBuf::from("1.2.276.0.123");
    let stem = path.file_stem().unwrap();
    assert_eq!(stem, OsStr::new("1.2.276.0"));
}

As for the workaround, here are the propositions I though of:

We could force the user to pass a filename command giving some template of the filename based on some dicom tags.
example: --filename "{PatientName}-{SOPInstanceUID}"
This would allow to only have to append the target to this pattern and be done with it.
I implemented a similar logic on an other project and it works fine if the errors (element not found) are correctly handled.
The main downside of this is that it introduces complexity for the user.
We could just append the target extension regardless of the original filename. (I think that's how dcmtk behaves)
This would work but we could easily end up with filenames that end in "123.dcm.png" which is not so pretty.
We could force filenames to be something predictable like a timestamp.
The downside would be that matching original files with their images is harder.

If someone has an other suggestion as well or any opinion on the best one it would be great.

Thanks

Enet4 · 2023-07-30T13:03:30Z

I would detect .dcm and only replace the extension in this case, extending the file name in all other cases.

PierreBou91 · 2023-08-05T09:48:41Z

Hello,

This should be good.

Multiframe images:

The user can pass the -F flag to ask for every frame to be converted
If -F is passed, every image will have a _n appended before the extension
By default, only the first frame will be converted

Error handling:

I added several other errors to make a better use of Snafu

Single file conversion:

The user can pass a single file (instead of a folder)
The user can use the -O option to rename the single file (excluding the extension that should be set with t)
The -O option will be ignored with a warning if multiple files are passed for conversion

File names:

dicom-toimage correctly handles name collisions by appending (n) before the extension if a file already has the name in the output folder.

Please tell me if any other change should be made, I'll gladly oblige.

Enet4

OK, so I had a chance to test this in practice, and I found these concerns:

I tried to do this, but it still requested out_file_name even though I was making multiple files.

λ dicom-toimage.exe -o mydata-images/ mydata/
error: the following required arguments were not provided:
  --out-file-name <OUT_FILE_NAME>

Usage: dicom-toimage.exe --out-file-name <OUT_FILE_NAME> --out-dir 
<OUT_DIR> <FILES>...

Adding the option with bogus data (-O 89u3tuh389) would work, although it was not doing anything in practice.

To maintain backwards compatibility, I propose that -o continues to be used for the output file name, and -d is used for the output directory, so that the examples below work.
```
dicom-toimage myfile.dcm -o myimage.png
dicom-toimage myfiles/ -d myimages/
```
By replacing frame_number with all_frames, We have lost the capability to select a specific frame to output. Here is an idea: we can make a hybrid of the two into someting like frames. If the option's value is "all", then the behavior is to write all frames. Otherwise, it is treated as a single integer of the frame number.

At least the first and second concerns should be handled here before merging. The third point is a bit of extra work, so maybe it can be worked on separately. Let me know if you need further assistance.

PierreBou91 · 2023-08-26T08:50:00Z

Hey,

Thanks for the testing and feedbacks!

I'm currently abroad but I'll make sure to make the modifications as soon as I'm back and have a little time. (likely during a weekend of September).

PierreBou91 · 2023-10-01T16:22:06Z

Hey hello, @Enet4

I am having trouble resolving conflicts in a clean manner. So I plan on starting from a new PR directly from master currents version of the toimage crate. So unless you have a better suggestion, I think we can close this PR.

I just have a quick question. I see that on master's version of main, you have extended the errors and also attributed some specific exit codes. Is there a specification on how exit codes should be attributed or should I just decrease as I add more errors (and group the one that share the same source as I think this is what you've done ?

Thanks !

Pierre

Enet4 · 2023-10-01T19:22:17Z

Hello @PierreBou91!

So unless you have a better suggestion, I think we can close this PR.

It is fine either way. It should also be possible to keep the pull request by hard-resetting the branch and doing a forced push.

As for the exit code, it was primarily to differentiate between different kinds of errors, but none of it is intended to follow a stable guideline, so feel free to rearrange them as needed.

PierreBou91 · 2023-10-08T22:30:28Z

I hard reset and force-pushed as advised and reimplemented the bulk conversion.

User can now pass a path to a file, multiple files or a folder (possibly recursively with -r flag). All the file paths will be collected in a Vec<InOuPair> and the output path of will be directly computed according to the --output to chose the name, --ext to chose the extension and, --dir to chose an output directory. The previous run() function is essentially unchanged, just moved in the convert_file function.

I have not reimplemented the frame selection for multiframe dicoms because I'm not sure yet how to do it with the --unwrap flag. I'll try to research the fragments because I'm not familiar with it and the vast majority of the dicoms I have tested did not successfully converted with --unwrap (only the multiframe dicom example worked fine).
Do you think we could see the multiframe part in a different PR ?

Please tell me if you see any improvements to be made. I also intent on doing an improved error handling for the inventory part.

- should pass one or more files, otherwise it's an error

and update README accordingly

- tweak phrasing and expand error conditions - adjust doctests to be more idiomatic

- add `promiscuous`, implemented by Enet4#500 - adjust documentation of other arguments

- rstest 0.19 - cargo update

- update description - set Edition to 2021 - cargo update

- updates ureq to 2.9.7

Enet4 added enhancement A-tool Area: tooling C-toimage Crate: dicom-toimage labels Jul 23, 2023

Enet4 self-requested a review July 24, 2023 07:53

PierreBou91 force-pushed the imp/toimage/bulk-conversion branch from 6bd996a to 5bbbcf2 Compare July 29, 2023 09:16

PierreBou91 marked this pull request as draft July 29, 2023 20:09

PierreBou91 force-pushed the imp/toimage/bulk-conversion branch from f7f18ab to 3c62392 Compare August 5, 2023 08:44

PierreBou91 marked this pull request as ready for review August 5, 2023 09:48

Enet4 requested changes Aug 24, 2023

View reviewed changes

PierreBou91 marked this pull request as draft October 8, 2023 21:40

PierreBou91 closed this Oct 8, 2023

PierreBou91 force-pushed the imp/toimage/bulk-conversion branch from b54b8b8 to b714dcc Compare October 8, 2023 21:41

PierreBou91 reopened this Oct 8, 2023

[toimage] Bulk conversion Ok

aedd300

Enet4 self-requested a review April 13, 2024 13:42

Enet4 marked this pull request as ready for review April 13, 2024 13:42

Enet4 force-pushed the imp/toimage/bulk-conversion branch from 7e71581 to aedd300 Compare April 13, 2024 13:45

Enet4 added 3 commits April 13, 2024 14:46

[dicom-toimage] mark file arg as required

31be0ed

- should pass one or more files, otherwise it's an error

[toimage] Remove redundant import

7685c96

[toimage] Tweak arg docs

57209e9

and update README accordingly

Enet4 and others added 27 commits May 15, 2024 11:44

[object] Tweak documentation of new methods

036b1f2

- tweak phrasing and expand error conditions - adjust doctests to be more idiomatic

[storescp] Update CLI app arguments

30855ea

- add `promiscuous`, implemented by Enet4#500 - adjust documentation of other arguments

Update dependencies

af0d276

- rstest 0.19 - cargo update

Update fuzz projects

6f25024

- update description - set Edition to 2021 - cargo update

cargo update

0e50f2c

cargo update again

7aaf633

- updates ureq to 2.9.7

format all code

d755ecd

[core] Bump to v0.7.0

6556757

[dictionary-std] Bump to v0.7.0

44649eb

[encoding] Bump to v0.7.0

fb273dd

[parser] Bump to v0.7.0

99f273b

[transfer-syntax-registry] Bump to v0.7.0

45a218a

[ul] Bump to v0.7.0

28fd0a6

[scpproxy] Bump to v0.7.0

2f57fbe

[object] Bump to v0.7.0

a26591a

[dump] Bump to v0.7.0

4b86187

[json] Bump to v0.7.0

4ac1151

[pixeldata] Bump to v0.7.0

be09581

[echoscu] Bump to v0.7.0

21a24af

[storescu] Bump to v0.7.0

8b353be

[storescp] Bump to v0.7.0

e21626a

[findscu] Bump to v0.7.0

c537d6e

[toimage] Bump to v0.7.0

42661ea

[fromimage] Bump to v0.7.0

0793482

[dictionary-builder] Bump to v0.7.0

6fc0af5

[dicom] Bump to v0.7.0

451c64a

Merge branch 'Enet4:master' into imp/toimage/bulk-conversion

a1c3bc3

PierreBou91 closed this May 15, 2024

PierreBou91 deleted the imp/toimage/bulk-conversion branch May 15, 2024 20:29

PierreBou91 mentioned this pull request May 15, 2024

[toimage] basic bulk conversion #518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[toimage]: bulk conversion working #397

[toimage]: bulk conversion working #397

PierreBou91 commented Jul 23, 2023 •

edited

Loading

Enet4 commented Jul 23, 2023

PierreBou91 commented Jul 23, 2023

Enet4 commented Jul 28, 2023

PierreBou91 commented Jul 30, 2023

Enet4 commented Jul 30, 2023

PierreBou91 commented Aug 5, 2023 •

edited

Loading

Enet4 left a comment •

edited

Loading

PierreBou91 commented Aug 26, 2023

PierreBou91 commented Oct 1, 2023

Enet4 commented Oct 1, 2023

PierreBou91 commented Oct 8, 2023 •

edited

Loading

[toimage]: bulk conversion working #397

[toimage]: bulk conversion working #397

Conversation

PierreBou91 commented Jul 23, 2023 • edited Loading

Enet4 commented Jul 23, 2023

PierreBou91 commented Jul 23, 2023

Enet4 commented Jul 28, 2023

PierreBou91 commented Jul 30, 2023

Enet4 commented Jul 30, 2023

PierreBou91 commented Aug 5, 2023 • edited Loading

Enet4 left a comment • edited Loading

Choose a reason for hiding this comment

PierreBou91 commented Aug 26, 2023

PierreBou91 commented Oct 1, 2023

Enet4 commented Oct 1, 2023

PierreBou91 commented Oct 8, 2023 • edited Loading

PierreBou91 commented Jul 23, 2023 •

edited

Loading

PierreBou91 commented Aug 5, 2023 •

edited

Loading

Enet4 left a comment •

edited

Loading

PierreBou91 commented Oct 8, 2023 •

edited

Loading