Skip to content

Anonymize DICOM image data by detecting text and replacing it with black squares

Notifications You must be signed in to change notification settings

mmiv-center/RewritePixel

Repository files navigation

Sanitize DICOM image data with text annotations

This project uses the tesseract >4.0 OCR engine to identify text that is burned into DICOM image data. For each text fragment (usually a word) a square black frame is written into the DICOM pixel information. The resulting DICOM file should be inspected - hopefully it is free of participant identifying information.

Warning: This program does not try to anonymize DICOM tags. Please check out the https://github.com/mmiv-center/DICOMAnonymizer project for a fast tag anonymizer.

Warning: There is no information yet on false/positive detection rates, verify the output by hand!

Build

We are using cmake to create a make file for the compilation. The program depends on a number of libraries (gdcm, tesseract) - best to look at the Dockerfile to get an idea on how to compile this program.

# in the best of all worlds this is sufficient to create the build system
cmake -DCMAKE_BUILD_TYPE=Debug .
make

Using docker:

> docker build -t rewritepixel -f Dockerfile .
...
> docker run -it --rm rewritepixel 
USAGE: rewritepixel [options]

Options:
  --help              Rewrite DICOM images to remove text. Read DICOM image
                      series and write out an anonymized version of the image
                      data.
  --input, -i         Input directory.
  --output, -o        Output directory.
  --confidence, -c    Confidence threshold (0..100).
  --numthreads, -t    How many threads should be used (default 4).
  --storemapping, -m  Store the detected strings as a JSON file.

Examples:
  rewritepixel --input directory --output directory
  rewritepixel --help

Notice: Don't forget that docker will not automatically see your systems directories. You need to use the '-v' option to make a folder visible inside the system before you can access data stored on your system. Here an example. Our data folder 'test_input' and 'test_output' are in the current users home directory.

docker run -it -v /home/<user name>/Documents/:/data --rm rewritepixel -i /data/test_input/ -o /data/test_output/

About

Anonymize DICOM image data by detecting text and replacing it with black squares

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published