Skip to content
/ gcrlm Public

Tools to analyze, process and augment the CRLM dataset

License

Notifications You must be signed in to change notification settings

giussepi/gcrlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRLM processor

Tools to analyze, process and augment the CRLM dataset

CRLM samples: A, B) Images with good quantity of annotations. C, D) Images scarcely annotated.

Installation

  1. Add to your requirements file:

    gcrlm @ https://github.com/giussepi/gcrlm/tarball/main

    and run pip install -r requirements.txt

    or just run

    pip install git+git://github.com/giussepi/gcrlm.git --use-feature=2020-resolver --no-cache-dir

    or just run

    pip install https://github.com/giussepi/gcrlm/tarball/main --use-feature=2020-resolver --no-cache-dir
  2. Make a copy of the configuration file and update it properly

    cp settings.py.template settings.py

  3. Update your main.py following the file main.py.template

Main Features

Basic Operations

Use the CRLM class to review the npdi files, extract annotations, bounding boxes, plot annotations, plot slides, extract masks, create annotations images with some data augmentation, etc. e.g.:

from gcrlm.managers import CRLM

tc = CRLM(index=1, fileroot='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set/')
tc.plot_slide()
name, tmpim, mask = tc.extract_annotation_image(1, plot=True)

Plot a ROI along with its Annotations

from gcrlm.managers import CRLM

tc = CRLM(index=1, fileroot='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set/')
tc.plot_roi(tc.all_rois[0], line_width=2.5)

Split Dataset into train, validation and test

from gcrlm.processors import CRLMSplitDataset, CRLMSmartSplitDataset

# randomly
CRLMSplitDataset(
    dataset_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set',
    val_size=.1, test_size=.2
)()

# stratified way
CRLMSmartSplitDataset(
    dataset_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set',
    val_size=.1, test_size=.2
)()

Create Annotation Crops and Masks from one WSI (.ndpi and .ndpa)

Use CRLM.extract_annotations_and_masks to create

from gcrlm.managers import CRLM

tc = CRLM(index=1, fileroot='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set/')

# without considering ROIs (single annotation per crop)
tc.extract_annotations_and_masks(level=4, min_mask_area=.000001, apply_image_transforms=False)

# Considering all the ROIs in the image (single annotation per crop). See first mask below.
tc.extract_annotations_and_masks(
    level=4, min_mask_area=.000001, apply_image_transforms=False,
	roi_clusters=tc.cluster_annotations_by_roi(), inside_roi=True,
)

# Considering multiple annotations of the same class per crop. See second mask below.
tc.extract_annotations_and_masks(
    level=4, min_mask_area=.000001, apply_image_transforms=False,
    roi_clusters=tc.cluster_annotations_by_roi(), inside_roi=True,
    multiple_ann=True
)

# Considering multiple annotations from all the classes per crop. See third mask below.
tc.extract_annotations_and_masks(
    level=4, min_mask_area=.000001, apply_image_transforms=False,
    roi_clusters=tc.cluster_annotations_by_roi(), inside_roi=True,
    multiple_ann=True, multiple_classes=True
)

Example image crop:

Example generated masks:

Verifying that one Mask was Correctly cropped

  1. Create masks from any slide. E.g.: CRLM 11.ndpi
tc = CRLM(index=11, fileroot='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set/')

# plot single annotation (the main one used to create the crop)
tc.extract_annotations_and_masks(
    level=4, min_mask_area=.000001, apply_image_transforms=False,
	roi_clusters=tc.cluster_annotations_by_roi(), inside_roi=True
)

# plot multiple annotations of the same class along with the main one
tc.extract_annotation_image(291, 2000, 2000,  plot=True, level=4, ann_mgr=tc.get_ann_mgr())

# plot multiple annotations from multiple classes along with the main annotation
tc.extract_annotation_image(
    291, 2000, 2000,  plot=True, level=4, ann_mgr=tc.get_ann_mgr(), multiple_classes=True)
  1. Open the folder annotations_masks, created at the parent directoy of Annotated - Training set/, and select a mask to be verified. E.g.: for C_f011_r02_a00100_c00401.mask.png the meanings of each part of its name are defined below:

<label>_f<DNPI file number>_r<roi number>_a<annotation number>_c<crop counter>.mask.png.

Note: When the roi number is -1, it means that the annotation is not inside any ROI.

  1. Use the ROI number and the annotation number to plot them together.
tc.plot_roi(2, annotations=[100], line_width=2.5, level=4)

Note: If your annotation is not inside a ROI, it has -1 as roi value. Then, you can still plot the image using the following lines:

_, _, _ = tc.extract_annotation_image(100, plot=True, level=4)
  1. Make a visual comparison of the plotted image with the mask and image patch. They should be a perfect match.

Analyze the CRLM dataset

Use the CRLMDatasetAnalyzer to get a quick analysis about the CRLM dataset

from gcrlm.processors import CRLMDatasetAnalyzer

CRLMDatasetAnalyzer(
    images_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set/'
)()

Create Annotation Crops and Masks from the whole CRLM Dataset

Use the CRLMDatasetProcessor data processor. This class can process the CRLM dataset in three different ways: Using only ROIs, using all the annotations, and using all the annotations but considering the ROIs when available. See gcrlm.constants.CRLMProcessingTypes for a complete description of each processing type.

from gcrlm.constants import CRLMProcessingTypes
from gcrlm.processors import CRLMDatasetProcessor

CRLMDatasetProcessor(
    level=4,
    images_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set/'
)(
    min_mask_area=.000001, processing_type=CRLMProcessingTypes.ALL_WITH_ROIS,
	multiple_ann=True, multiple_classes=True
)

Quantify created crops

Quantify the crops per label with the CRLMCropsAnalyzer class

from gcrlm.processors import CRLMCropsAnalyzer

ann_counter = CRLMCropsAnalyzer(
    crops_masks_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/annotations_masks/'
)()

Perform data augmentation over the crops & masks per label/class

Use the CRLMAugmentationProcessor to multiply the number of images/crops per class at will. E.g.:

from gcrlm.processors import CRLMAugmentationProcessor, CRLMCropsAnalyzer
from gcrlm.core.models import AugmentDict

# using custom class multipliers
# the following code will only duplicate the number of crops and masks from the classs foreign_body
CRLMAugmentationProcessor(
    images_masks_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/annotations_masks/',
    augment_dict=AugmentDict(
        hepatocyte=1,
        necrosis=1,
        fibrosis=1,
        tumour=1,
        inflamation=1,
        mucin=1,
        blood=1,
        foreign_body=2,
        macrophages=1,
        bile_duct=1
    )
)()

# calculating class multipliers automatically
CRLMAugmentationProcessor(
    images_masks_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/annotations_masks/',
)()

# multiplying the whole dataset times 2
CRLMAugmentationProcessor(
    images_masks_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/annotations_masks/',
    class_multiplier=2,
)()

# Verifying/reviewing the results by using the CRLMCropsAnalyzer class

CRLMCropsAnalyzer(crops_masks_path='augmented_dataset')()

Randomly Sample Crop Subdatasets

Use the CRLMCropRandSample to randomly sample your crop subdatasets using a user-defined number of crops/images & masks per label/class.

from gcrlm.processors import CRLMCropRandSample

CRLMCropRandSample(samples_per_label=1000, dataset_path='/path/to/my/crop_subdataset/')()

Split, Create Crops and Augment in a Single Step

With the class CRLMSplitProcessAugment you can do all the previously mentioned operations at once. However, should really know the right parameters because it will take some hours, so you don't want to see errors in the middle of processing the CRLM dataset!. Thus, we recommend getting some experience using CRLMSplitDataset, CRLMDatasetProcessor and CRLMAugmentationProcessor classes before using the CRLMSplitProcessAugment class.

The most important points to bear in mind are:

  1. All the new folders will be created inside the dataset_path.
  2. When creating the folder splits (train, val, test), the images and masks will me moved to their corresponding folders.
  3. The crops and their masks will be created in the folders train_processed, val_processed and test_processed.
  4. The augmentation of images inside the train_processed folder will be saved at the train_processed_augmented folder.
  5. If you want to run CRLMSplitProcessAugment again then you have to the following:
    1. Move all the images and masks from train, val and test folders to their parent folder. I.e. if train, val and test folders are inside /giussepi/CRLM/ (this is the original dataset_path) then move the images and mask to your dataset_path.
    2. Remove all the folders inside your dataset_path. Remember not to remove the images you moved in the previous step.
    3. Prepare your CRLMSplitProcessAugment configuration and run it!
from gcrlm.processors import CRLMSplitProcessAugment
from gcrlm.core.models import AugmentDict

CRLMSplitProcessAugment(
    dataset_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set (copy)',
    val_size=.1, test_size=.2, level=4, min_mask_area=.000001, multiple_ann=True, multiple_classes=True,
    augment_dict=AugmentDict(
        hepatocyte=1,
        necrosis=2,
        fibrosis=1,
        tumour=1,
        inflamation=2,
        mucin=5,
        blood=7,
        foreign_body=17,
        macrophages=6,
        bile_duct=2
    )
)()

# calculating class multipliers automatically
CRLMSplitProcessAugment(
    dataset_path='/media/giussepi/2_0_TB_Hard_Disk/improved_CRLM_dataset/Annotated - Training set (copy)',
    val_size=.1, test_size=.2, level=4, min_mask_area=.000001, multiple_ann=True, multiple_classes=True
)()

Randomly split a dataset of crops into train, val and test subdatasets

from gcrlm.processors import CRLMRandCropSplit

CRLMRandCropSplit(dataset_path='<path_to_my_crops_dataset>', val_size=.1, test_size=.1)()

LOGGING

This application is using logzero. Thus, some functionalities can print extra data. To enable this just open your settings.py and set DEBUG = True. By default, the log level is set to logging.INFO.

About

Tools to analyze, process and augment the CRLM dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages