Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for saving segments in a single file #464

Merged
merged 29 commits into from
Nov 16, 2023

Conversation

fedorov
Copy link
Member

@fedorov fedorov commented Jan 25, 2023

WIP towards addressing #462

@fedorov
Copy link
Member Author

fedorov commented Jan 25, 2023

Failures of the round-trip tests are expected, since output is not what is expected.

Issues identified that need to be fixed:

@fedorov
Copy link
Member Author

fedorov commented Feb 6, 2023

🎆

@fedorov
Copy link
Member Author

fedorov commented Feb 6, 2023

Now I need to fix DICOMSegmentationPlugin.py which unfortunately makes the assumption that there will always be one segment per nrrd file:

https://github.com/QIICR/QuantitativeReporting/blob/71f01876d95b6190218b4e23b2dd30d9d5e82488/DICOMPlugins/DICOMSegmentationPlugin.py#L142-L150

fedorov and others added 13 commits April 7, 2023 13:08
perhaps this is faster than iterating over the pixels in a loop?
Introducing a new class OverlapUtil that is able to check whether
segments of a given DICOM segmentation are overlapping.

It computes an overlap matrix denoting which segments overlap with each
other. Also, from that it can derive groups of segments that do not
overlap, e.g. for exporting them later into NRRD files or similar
formats that cannot deal with overlapping segments.
The code no produces first results - NRRD file(s) for merged segments,
though they still need to be checked for correctness.

Also on some segmentations the code still seems to crash (to be fixed).
Ensure that errors are passed up the hierarchy. This esp. fixes handling
of files with non-consecutive segment numbers (such as an old verison of
the brain atlas that uses weird segment numbers).
Fix computation of Image origin for ITK which has been introduced when
refactoring the code.

Let DCMTK make use of STL classes which gives a some speedup in
particular regarding the usage of OFMap that is used in the functional
group classes (dcmfg module).
Fix assignment of frames to NRRD file (used index instead of vector element
at that index).

Speed up writing of segmentation by disabling functional group structure
checking.

The segmentation to ITK validator should now produce valid NRRD files,
grouping non-overlapping segments into the same file (if option
--mergedSegments is being provided on the command line).
This also make sure that the updated workflow actions are used.
Since ITK to DICOM and DICOM to ITK conversion do not have much in common,
the original class ImageSEGConverter is now splitted into Itk2DicomConverter
and Dicom2ItkConverter.
@fedorov
Copy link
Member Author

fedorov commented Nov 3, 2023

@michaelonken I think I know why it fails. The NRRD outputs you are saving start with prefix 0, while they used to start from 1. E.g., 23x38x3-0.nrrd should have been named 23x38x3-1.nrrd. It should be a trivial fix, since I just think you need to bump the key returned in this map to start from 1: https://github.com/fedorov/dcmqi/blob/merged-segments/apps/seg/segimage2itkimage.cxx#L43. I think it would be better it is fixed at the time that map is created rather than in the loop that writes the files, for consistency of the API, so I think it will be more efficient if you do it, because I can't tell for sure what this may affect.

@michaelonken
Copy link
Member

Makes sense,thanks! I'll try this on Monday.

@fedorov
Copy link
Member Author

fedorov commented Nov 6, 2023

@michaelonken here are commands to download the image and segmentation for an oblique acquisition:

  • seg: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/c6d7ad8a-835c-478f-956e-a09171698f51/*' .
  • image: s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com cp 's3://idc-open-data/89ac2bc6-893e-4fc7-9dcc-4731617fd716/*' .

You will need to have s5cmd installed first: https://github.com/peak/s5cmd.

For the sake of completeness, the above commands will download corresponding DICOM series from the public AWS buckets maintained by NCI Imaging Data Commons (IDC). The samples are from the QIN-Prostate-Repeatablity collection, which is hosted on IDC.

Fixed tests caused by bug not exporting last frame of a segment.
The underlying API has been changed so usage error is less likely
to occur.

Simplified includes, better error messages, documentation.
Bumping node version -- but not too high since at some version node
requires higher GLIBC version. Higher verisons than 12.19.1 may work,
though, earlier versions did not work.
The coordinate relevant for frame sorting is now identified by computing the
cross product for image orientation patient and selecting the coordinate that
results in the bigges absolute value.

Minor enhancements in error handling and documentation.
Before, all ITK images have been returned at once after conversion.
Now, this is done with an iterator-like access that allows to
delete an ITK image once produced by the converter. The converter class
does not keep any reference to the ITK image, so once it goes out of
scope of the "user" class (like segimage2itkimage.cxx), memory
of the image is freed (converter returns smart pointer).
Fixed bug when creating the JSON for merged segments where more than
one file was produced. Instead of then splitting segments into different
entries in the outer array of the JSON, the segments have been written
to the inner array instead.
Added test that writes 3 segments, 2 of them non-overlapping. That means
that two NRRD files will be produced, one with 2 segments, the other
with only a single one.
@michaelonken
Copy link
Member

Current status of this pull request:

  • Overall state: The DICOM to ITK conversion is now in a quite stable state -- from my point of view :-). More tests would be good though since I used limited amount of test data so far.
  • Refactoring: I refactored the SegImageConverter code into two separate classes (Itk2DicomConverter and Dicom2ItkConverter) since both use cases did not have much doe in common. Also I (earlier) refactored the DIcom2Itk code quite a lot to make it more readable and reusable.
  • Tests: Tests are working now (again) on all 3 supported platforms, including one test to verify the NRRD and JSON files produced for the --mergeSegment case.

Weaknesses:

  • The slice sorting is done by identifying the relevant coordinate for sorting (by evaluation cross product of image orientation vectors). The resulting frame (positions) are grouped into a "logical frame" if they are not further away than 1% of slice thickness. I am not sure whether this is a valid approach or not.
  • Right now the overlap checking compares 8 pixels at once through a binary AND operation. This works well for images where rows * cols MOD 8 is 0, i.e. dividable by 8 since then it is easy to access each frame separately. When this is not the case, frames need to be unpacked from 1 bit binary DICOM into 8 bit pixel cells (one byte per bit) which takes significant time. It should be evaluated whether the byte-wise comparison method also works for rows * cols % 8!=0 cases, or, at least with some extra code.

Other remaining work:

  • Bonus: NRRD export w/o ITK: Looking at the simplicity of the NRRD files, I could imagine a direct export from DICOM to NRRD without the ITK detour would also be significantly faster. But maybe we can see whether performance and memory is already good enough right now. For instance, writing the BWH Total Segmentator example to a merged ITK image only(?) takes around 5 seconds right now. This could also go to DCMTK (as the OverlapUtil class probably will) as a little extra tool.

@fedorov fedorov assigned pieper and unassigned pieper Nov 14, 2023
@fedorov fedorov requested a review from pieper November 14, 2023 17:45
@fedorov
Copy link
Member Author

fedorov commented Nov 14, 2023

@michaelonken I built the branch, and it appears that the output files that are created by segimage2itkimage are still numbered starting from 0. Is there a reason for this? I think there is benefit to maintaining numbering conventions for compatibility with other tools. Specifically, I ran across this because Slicer plugin expected 0-based numbering. I can fix that, but I wonder if there are other tools/workflows that can get broken because of this change.

@michaelonken
Copy link
Member

@michaelonken I built the branch, and it appears that the output files that are created by segimage2itkimage are still numbered starting from 0.

Hm, I could not confirm this on my system, e.g. see the following log:

/dcmqi-build/bin/segimage2itkimage --inputDICOM ~/data/dcm/SEG/bwh_NLST_207114_TotalSegmentator/seg/segmentation.dcm --outputDirectory /tmp

dcmqi repository URL: git@github.com:QIICR/dcmqi.git revision: c1f09ef tag: latest-89-gc1f09ef
Row direction: 1 0 0
Col direction: 0 -1 0
Z direction: 0 0 -1
Total frames: 9654
Total frames with unique IPP: 455
Total overlapping frames: 455
Origin: [-179.1, 185.217, -31.755]
Slice extent: 329.15
Slice spacing: 0.725002
Will not merge segments: Splitting segments into 77 groups
Writing itk image to /tmp/1.nrrd ... done
Writing itk image to /tmp/2.nrrd ... done

Also using --mergedSegements starts exports with:
Writing itk image to /tmp/1.nrrd ... done
Listing /tmp also shows that there is no file /tmp/0.nrrd and the lowest number found is /tmp/1.nrrd.

Maybe I am overlooking something, or have you maybe used an old version that has been in your PATH or so?

@fedorov
Copy link
Member Author

fedorov commented Nov 15, 2023

@michaelonken I am very sorry - I messed it up. Indeed, I was on a stale branch. I thought I updated, but I was multitasking, and apparently I did something wrong. Output file numbering works as expected! I am fixing few items in the Slicer QuantitativeReporting extension in this PR that I discovered in the process of testing.

@fedorov
Copy link
Member Author

fedorov commented Nov 15, 2023

@michaelonken I did testing with various samples from IDC, and it works great!

I am very tempted to just merge it and get this out into Slicer. We can address refinements in subsequent work. What do you think?

It should be evaluated whether the byte-wise comparison method also works for rows * cols % 8!=0 cases, or, at least with some extra code.

Indeed, it will require some extra code, since AND is applied to frames before unpacking. Such frame sizes are not common. For the record, I used the query below to search IDC for such segmentations:

SELECT
  collection_id,
  PatientID,
  StudyInstanceUID,
  SeriesInstanceUID,
  `Rows`,
  `Columns`,
  ARRAY_LENGTH(SegmentSequence) AS numberOfSegments
FROM
  `bigquery-public-data.idc_current.dicom_all`
WHERE
  Modality="SEG"
  AND MOD(`Rows`*`Columns`, 8) <> 0
ORDER BY
  Collection_id desc

and only three collections contain such samples, all of them containing just single segment. So I do not think it makes sense to spend too much time on this, or delay integration of this PR.

@michaelonken
Copy link
Member

Thanks for testing @fedorov ! Nice SQL query :-)

For the MOD 8 issue: I agree, let's skip it for now. I will look into it if there is some time left. I thought about it and I think the bit-wise comparison will also work for the other frame sizes as long as the "margin" bytes at the beginning and at the end of the frame, which may contain overlapping bits from other frames, are masked out correctly first. This is a bit intricate so I don't want to implement this quickly without proper unit tests.

The one remaining thing that we might to guard against are oblique slices. If I understand correctly, in those Series the position (Image Position Patient) does not only vary in one (e.g. z) coordinate between slices, but also in the other two coordinates (x,y). The overlap detection code right now superimposes frames that are on the same z coordinate, without "aligning" them on their x,y coordinate. This way pixels are compared that might have totally different positions in 3D space, eventually leading to broken merges or non-merges of segments. The right way would be to accommodate for the x,y shift before doing the comparison. Probably it is just some "simple" matrix operation(s) but nothing I could shake off my sleeve. As a workaround, one might try to detect oblique slicing and then fallback to the non-merging operation.

What do you think?

@fedorov
Copy link
Member Author

fedorov commented Nov 16, 2023

@michaelonken I spent some time looking at the code in question, which as I understand is around here: https://github.com/fedorov/dcmqi/blob/merged-segments/libsrc/OverlapUtil.cpp#L736-L761.

I think this logic could be significantly simplified. Instead of trying to detect relevantCoordinate, we can use the existing FrameSorter class to order frames by their position along the ImageOrientationPatient vector, we already have the function for this: https://github.com/fedorov/dcmqi/blob/merged-segments/include/dcmqi/framesorter.h#L230. Given that ordering, I think we should use the difference between the values of dist here https://github.com/fedorov/dcmqi/blob/merged-segments/include/dcmqi/framesorter.h#L268 to detect overlapping frames, applying the same threshold logic. I am pretty sure this should be robust to any orientation.

@michaelonken
Copy link
Member

michaelonken commented Nov 16, 2023

@fedorov I can check this in detail Tuesday next week. From just taking a quick look I am not sure whether the frame sorting code you referenced takes into account the x,y coordinate shifting that I mentioned. It's clear, also for oblique cases, how to identify the relevant coordinate (using Image Orientation Patient as the framesorter code and the OverlapUtil code does).

The problem I see is in the shifting of the other two coordinates, so that if two frames are on the same "plane" (e.g. z coordinate, "relevant coordinate"), we cannot simply compare value for "pixel" 1,1 from frame A to the 1,1 from frame B since Frame B might start 2 cm more to the right (e.g. x direction) and 1 cm more in y-direction. Here is a very sophisticated picture to demonstrate my bad explanation :-)

grafik

@fedorov
Copy link
Member Author

fedorov commented Nov 16, 2023

Michael, I do not think this would be the case for any volumetric acquisition. In oblique scans, the frames are not shifted when observed along the scan direction. What you show would be the case if one projected oblique acquisition slices to a plane.

@michaelonken
Copy link
Member

michaelonken commented Nov 16, 2023

Ah, thank you. Sorry for the noise then. And we can always assume that we don't have such kind of shifts in any acquisitions we handle?

I think then we are safe to go with the current code. We could simplify by using the existing framesorter code (that I just didn't know) but maybe this can be a second step a bit later if you want to merge.

P.S: Note that Frame A and Frame B are usually parts of two different segments, not of a real "acquisition" of a modality, so a segmentation creator could in theory at least place frames in space where he wants to; this is what I am afraid of.

@fedorov
Copy link
Member Author

fedorov commented Nov 16, 2023

P.S: Note that Frame A and Frame B are usually parts of two different segments, not of a real "acquisition" of a modality, so a segmentation creator could in theory at least place frames in space where he wants to; this is what I am afraid of.

This may be a valid scenario in theory, but it would be crazy, and in any case it is probably not handled in dcmqi in other places. I am not sure. I have never seen such cases. Would be nice to add an error check for this or something like this!

I will go ahead merging this, to make this functionality easier accessible to the users, and make it easier to test and identify deficiencies!

@fedorov fedorov marked this pull request as ready for review November 16, 2023 21:24
@fedorov fedorov merged commit a67c6c7 into QIICR:master Nov 16, 2023
5 of 6 checks passed
fedorov added a commit to fedorov/MITK that referenced this pull request Dec 5, 2023
There are two very important improvements in this updated version:

1. Performance of conversion is improved perhaps by at least an order of magnitude, cutting down the conversion for large DICOM SEG (such as TotalSegmentator results) from minutes to seconds.
2. It is now possible to save non-overlapping segments into a single file instead of previous convention of saving each segment as a separate file. See details in QIICR/dcmqi#464. Note that to use this feature you will need to use the --merge flag (there may be changes needed to the MITK dcmqi wrapper - I am sure you will test this - but there are no changes to the JSON schema)
mitk-bot pushed a commit to MITK/MITK that referenced this pull request Jun 11, 2024
There are two very important improvements in this updated version:

1. Performance of conversion is improved perhaps by at least an order of magnitude, cutting down the conversion for large DICOM SEG (such as TotalSegmentator results) from minutes to seconds.
2. It is now possible to save non-overlapping segments into a single file instead of previous convention of saving each segment as a separate file. See details in QIICR/dcmqi#464. Note that to use this feature you will need to use the --merge flag (there may be changes needed to the MITK dcmqi wrapper - I am sure you will test this - but there are no changes to the JSON schema)
mitk-bot pushed a commit to MITK/MITK that referenced this pull request Jun 11, 2024
Summary:
Upgrade DCMQI to 1.3.0

There are two very important improvements in this updated version:

1. Performance of conversion is improved perhaps by at least an order of magnitude, cutting down the conversion for large DICOM SEG (such as TotalSegmentator results) from minutes to seconds.
2. It is now possible to save non-overlapping segments into a single file instead of previous convention of saving each segment as a separate file. See details in QIICR/dcmqi#464. Note that to use this feature you will need to use the --merge flag (there may be changes needed to the MITK dcmqi wrapper - I am sure you will test this - but there are no changes to the JSON schema)

Update dcmqi to 1.3.1

Bump to DCMQI v1.3.2

Adapt to new DCMQI interface

+ fixed T30286

Fixed error in label generation when loading multiple groups from DICOM Seg

Fixed error in assigning the label name when DCM Seg is loaded

Added support for SegmentOverlap tag.

If segment overlap == NO then all segements are added as label of one
group. In all other cases each segment is added as new group (old
behavior).

+ fixed memory leak

Test Plan:
code review, test to load and save segs as/from DCM Seg. Segmentations of a group in MITK should stay now a group also after loading.
Loading old/other DCM seg results in a group per segment/label.

Reviewers: O1 MITK Reviewer Group I, kislinsk

Reviewed By: O1 MITK Reviewer Group I, kislinsk

Subscribers: kislinsk

Maniphest Tasks: T30286

Differential Revision: https://phabricator.mitk.org/D966
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants