Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[toimage]: bulk conversion working #397

Closed

Conversation

PierreBou91
Copy link
Contributor

@PierreBou91 PierreBou91 commented Jul 23, 2023

As per #343 I have implemented the possibility to convert multiple dicoms to a folder of images.

For consistency, I have copied a lot of the logic from the dicom-storescu but I don't think it's worth extracting it yet.

Here are my questions/concerns about this commit and I need some guidance:

1. Not tested on multiframe dicom:
I am very unfamiliar with multiframe dicoms and I don't really know where to find or create proper ones. I was not able to test this commit with multiframe dicoms.
The basic functionality should be preserved but even though, I'm not certain that this is the expected behavior (convert the nth frame of every multiframe dicom in the input).
This is by far my biggest concern for this commit because there might be regressions and I don't know of an acceptable way to handle this since the use case (multiframe to image) is unclear to me.

2. Error enum with only Error::Other variant:
Initially I copied the Error enum from dicom-storescu required in the check_file function return type that I also imported. But since Error::Other is the only variant, maybe should it just be removed altogether ?

3. output flag was replaced by out-dir:
This has several consequences:

  • The possibility to rename the output file for a single conversion is not retained.
  • If several files have the same name in the set to be converted they will be overridden except for the last one being converted.

I just went to the simplest implementation that matched the specs in #343. If the mentioned consequences are not acceptable, I can try and find some ways to fix them.

4. The format flag currently takes a String:
It works as is but since the image crate is exposing an image::ImageFormat enum, this can be a good base for clap's checking and suggestions.
I refrained from doing it because I didn't know if every image format should be allowed or any other restriction I don't know about.

Overall this commit is far from perfect but I still opened this PR since I already find it more convenient than DCMTK's equivalent binary.

@Enet4 Enet4 added enhancement A-tool Area: tooling C-toimage Crate: dicom-toimage labels Jul 23, 2023
@Enet4
Copy link
Owner

Enet4 commented Jul 23, 2023

Thank you for working on this. As I suspect that we might need some time to review and I was about to bring forward a new release of DICOM-rs, I propose that this is worked on after 0.6.0 is out. This should not stop me from publishing this enhancement in a subsequent minor release anyway.

@PierreBou91
Copy link
Contributor Author

Oh yeah sure, I think you're right ! I'll check in back here when the 0.6.0 is out.

@Enet4 Enet4 self-requested a review July 24, 2023 07:53
@Enet4
Copy link
Owner

Enet4 commented Jul 28, 2023

1. Not tested on multiframe dicom: I am very unfamiliar with multiframe dicoms and I don't really know where to find or create proper ones. I was not able to test this commit with multiframe dicoms. The basic functionality should be preserved but even though, I'm not certain that this is the expected behavior (convert the nth frame of every multiframe dicom in the input). This is by far my biggest concern for this commit because there might be regressions and I don't know of an acceptable way to handle this since the use case (multiframe to image) is unclear to me.

DICOM WG4 has a data set with some example files, and the pydicom test files also have some files which are multi-frame. Running cargo test on the project already downloads some of these via dicom-test-files (see for example pydicom/color3d_jpeg_baseline.dcm).

In practice, I can't be sure which behavior is more suitable, but we can either a) pick the first frame only, or b) print all frames into separate files with an extension to the file name, e.g. abc.dcm to abc_0.png, abc_1.png, abc_2.png,...

2. Error enum with only Error::Other variant: Initially I copied the Error enum from dicom-storescu required in the check_file function return type that I also imported. But since Error::Other is the only variant, maybe should it just be removed altogether ?

Generally, we would go in the opposite direction from here: given this Error type, we can slowly add more specific error variants and replace context calls accordingly.

3. output flag was replaced by out-dir: This has several consequences:

  • The possibility to rename the output file for a single conversion is not retained.
  • If several files have the same name in the set to be converted they will be overridden except for the last one being converted.

I just went to the simplest implementation that matched the specs in #343. If the mentioned consequences are not acceptable, I can try and find some ways to fix them.

I admit that this was not made clear in the original issue, but I would prefer that the functionality to convert a single image is retained. It is OK if the code ends up with a bit of redundancy. --out-dir and -o should be marked as conflicting with each other, so it's either one or the other.

4. The format flag currently takes a String: It works as is but since the image crate is exposing an image::ImageFormat enum, this can be a good base for clap's checking and suggestions. I refrained from doing it because I didn't know if every image format should be allowed or any other restriction I don't know about.

That should be fine. At least adding support for extra image formats is as simple as updating image and its feature set.

@PierreBou91 PierreBou91 marked this pull request as draft July 29, 2023 20:09
@PierreBou91
Copy link
Contributor Author

Thanks for your guidance @Enet4 !

I made good progress on this PR, but I have an issue and I'm not sure what is the best way to handle it:

To select the image format, it's only necessary to set the extension of the PathBuf to the target extension.

So far, I took the original file name and set the extension but I realized there is an issue with filenames that don't include an extension and include a "." which is unfortunately pretty common.
In such cases, methods like file_stem or set_extension don't work correctly since they consider the last portion of the filename (after the last ".") to be an extension so it get lost.

For example given a file f with a filename of "1.2.276.0.123":

  • calling set_extension("png") on f would return "1.2.276.0.png" (losing the last part)
  • calling file_stem() on f would return "1.2.276.0" (also losing the last part)

Below is a code snippet to illustrate:

use std::{ffi::OsStr, path::PathBuf};

fn main() {
    let mut path = PathBuf::from("1.2.276.0.123");
    path.set_extension("png");
    assert_eq!(path, PathBuf::from("1.2.276.0.png"));

    let path = PathBuf::from("1.2.276.0.123");
    let stem = path.file_stem().unwrap();
    assert_eq!(stem, OsStr::new("1.2.276.0"));
}

As for the workaround, here are the propositions I though of:

  1. We could force the user to pass a filename command giving some template of the filename based on some dicom tags.
    example: --filename "{PatientName}-{SOPInstanceUID}"
    This would allow to only have to append the target to this pattern and be done with it.
    I implemented a similar logic on an other project and it works fine if the errors (element not found) are correctly handled.
    The main downside of this is that it introduces complexity for the user.

  2. We could just append the target extension regardless of the original filename. (I think that's how dcmtk behaves)
    This would work but we could easily end up with filenames that end in "123.dcm.png" which is not so pretty.

  3. We could force filenames to be something predictable like a timestamp.
    The downside would be that matching original files with their images is harder.

If someone has an other suggestion as well or any opinion on the best one it would be great.

Thanks

@Enet4
Copy link
Owner

Enet4 commented Jul 30, 2023

I would detect .dcm and only replace the extension in this case, extending the file name in all other cases.

@PierreBou91
Copy link
Contributor Author

PierreBou91 commented Aug 5, 2023

Hello,

This should be good.

  1. Multiframe images:
  • The user can pass the -F flag to ask for every frame to be converted
  • If -F is passed, every image will have a _n appended before the extension
  • By default, only the first frame will be converted
  1. Error handling:
  • I added several other errors to make a better use of Snafu
  1. Single file conversion:
  • The user can pass a single file (instead of a folder)
  • The user can use the -O option to rename the single file (excluding the extension that should be set with t)
  • The -O option will be ignored with a warning if multiple files are passed for conversion
  1. File names:
  • dicom-toimage correctly handles name collisions by appending (n) before the extension if a file already has the name in the output folder.

Please tell me if any other change should be made, I'll gladly oblige.

@PierreBou91 PierreBou91 marked this pull request as ready for review August 5, 2023 09:48
Copy link
Owner

@Enet4 Enet4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I had a chance to test this in practice, and I found these concerns:

  • I tried to do this, but it still requested out_file_name even though I was making multiple files.
    λ dicom-toimage.exe -o mydata-images/ mydata/
    error: the following required arguments were not provided:
      --out-file-name <OUT_FILE_NAME>
    
    Usage: dicom-toimage.exe --out-file-name <OUT_FILE_NAME> --out-dir 
    <OUT_DIR> <FILES>...
    
    Adding the option with bogus data (-O 89u3tuh389) would work, although it was not doing anything in practice.
  • To maintain backwards compatibility, I propose that -o continues to be used for the output file name, and -d is used for the output directory, so that the examples below work.
    dicom-toimage myfile.dcm -o myimage.png
    dicom-toimage myfiles/ -d myimages/
    
  • By replacing frame_number with all_frames, We have lost the capability to select a specific frame to output. Here is an idea: we can make a hybrid of the two into someting like frames. If the option's value is "all", then the behavior is to write all frames. Otherwise, it is treated as a single integer of the frame number.

At least the first and second concerns should be handled here before merging. The third point is a bit of extra work, so maybe it can be worked on separately. Let me know if you need further assistance.

@PierreBou91
Copy link
Contributor Author

Hey,

Thanks for the testing and feedbacks!

I'm currently abroad but I'll make sure to make the modifications as soon as I'm back and have a little time. (likely during a weekend of September).

@PierreBou91
Copy link
Contributor Author

Hey hello, @Enet4

I am having trouble resolving conflicts in a clean manner. So I plan on starting from a new PR directly from master currents version of the toimage crate. So unless you have a better suggestion, I think we can close this PR.

I just have a quick question. I see that on master's version of main, you have extended the errors and also attributed some specific exit codes. Is there a specification on how exit codes should be attributed or should I just decrease as I add more errors (and group the one that share the same source as I think this is what you've done ?

Thanks !

Pierre

@Enet4
Copy link
Owner

Enet4 commented Oct 1, 2023

Hello @PierreBou91!

So unless you have a better suggestion, I think we can close this PR.

It is fine either way. It should also be possible to keep the pull request by hard-resetting the branch and doing a forced push.

As for the exit code, it was primarily to differentiate between different kinds of errors, but none of it is intended to follow a stable guideline, so feel free to rearrange them as needed.

@PierreBou91 PierreBou91 marked this pull request as draft October 8, 2023 21:40
@PierreBou91 PierreBou91 closed this Oct 8, 2023
@PierreBou91 PierreBou91 reopened this Oct 8, 2023
@PierreBou91
Copy link
Contributor Author

PierreBou91 commented Oct 8, 2023

I hard reset and force-pushed as advised and reimplemented the bulk conversion.

User can now pass a path to a file, multiple files or a folder (possibly recursively with -r flag). All the file paths will be collected in a Vec<InOuPair> and the output path of will be directly computed according to the --output to chose the name, --ext to chose the extension and, --dir to chose an output directory. The previous run() function is essentially unchanged, just moved in the convert_file function.

I have not reimplemented the frame selection for multiframe dicoms because I'm not sure yet how to do it with the --unwrap flag. I'll try to research the fragments because I'm not familiar with it and the vast majority of the dicoms I have tested did not successfully converted with --unwrap (only the multiframe dicom example worked fine).
Do you think we could see the multiframe part in a different PR ?

Please tell me if you see any improvements to be made. I also intent on doing an improved error handling for the inventory part.

@Enet4 Enet4 self-requested a review April 13, 2024 13:42
@Enet4 Enet4 marked this pull request as ready for review April 13, 2024 13:42
@Enet4 Enet4 force-pushed the imp/toimage/bulk-conversion branch from 7e71581 to aedd300 Compare April 13, 2024 13:45
- should pass one or more files,
  otherwise it's an error
and update README accordingly
@PierreBou91 PierreBou91 deleted the imp/toimage/bulk-conversion branch May 15, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tool Area: tooling C-toimage Crate: dicom-toimage enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants