We suggest to use miniconda as package manager for your system. Create and activate conda environment with:
conda env create -f environment.yml
conda activate wsi-pre2
The main branch is protected. Please checkout your own development branch and draft a Pull Request to commit changes.
This repository uses pre-commit hooks and Github Actions for code quality, inspired by the Lightning Hydra Template.
The pre-commit library comes installed with the conda environment. You should then setup pre-commit which uses the hooks from .pre-commit-config.yaml:
pre-commit install
After that your code will be automatically reformatted on every new commit.
To reformat all files in the project use command:
pre-commit run -a
To update hook versions in .pre-commit-config.yaml use:
pre-commit autoupdate
The main script to run is tile_generator.py
. We provide configs in the configs/
folder which generate tables of patch locations with the corresponding pixel sizes. The tables are then stored as .csv
files for each slide in the configured output_path
.
By default multiprocessing is enabled, such that multiple slides can be processed simultaneously.
As example the tiling of TCGA slides with patch_size=256
can be started as follows:
python tile_generator.py --config configs/tcga-crc_256.json
The table shows descriptions for the most important config parameters:
Dictionary Entry | Description |
---|---|
check_resolution | Perform a resolution check of all slides before extracting patches |
use_tissue_detection | Toggle the activation of tissue detection |
remove_top_border | Useful for Camelyon slides. Default is false |
save_patches | In old pipelines we used to store patches. In this project the default is false |
zip_patches | Experimental to try if zipped patch image directories increase transfer speeds. Default is false. |
tissue_coverage | Threshold [0,1] for how much tissue coverage is necessary, default is 0.8 |
processing_level | Level of downscaling by openslide - Lowering the level will increase precision but more time is needed, default is 3 |
blocked_threads | Number of threads that won't be used by the program |
patches_per_tile | Number of patches used for lower resolution operations like tissue detection |
overlap | Value [0,1] to set the overlap between neighbouring unannotated patches |
annotation_overlap | Value [0,1] to set the overlap between neighbouring annotated patches |
patch_size | Output pixel size of the quadratic patches |
calibration | |
use_non_pixel_lengths | Activate calibration and use micrometers instead of pixels |
patch_size_microns | Specify the patch size in micrometers. At 0.25 |
resize | Whether to resize the patches in micrometers to the given patch_size |
dataset | Provide name for the dataset |
slides_dir | Directory where the different slides and subdirs are located |
slideinfo_file | Provide a .csv file with filenames and labels |
annotation_dir | Directory where the annotations are located |
annotation_file_format | File format of the input annotations ("xml","geojson") |
output_path | Output directory to where the resulting files will be stored |
skip_unlabeled_slides | Boolean to skip slides without an annotation file |
save_annotated_only | Boolean to only save annotated patches |
output_format | Image output format. Either "jpeg" or "png" |
metadata_format | Format in which slide metadata is stored. Default is "csv" |
write_slideinfo | Write information about the processed slide |
show_mode | Boolean to enable plotting of some intermediate results/visualizations |
label_dict | Structure to set up the operator and the threshold for checking the coverage of a certain class |
type | Operator type ["==", ">=", "<="] |
threshold | Coverage threshold for the individual class |