This repository contains course project deliverables of a project revolving around using MONAI and SimpleITK to perform semantic segmentation on the LIVECell dataset.
The overview of the project is as follows:
- Preprocess LIVECell Dataset to make it suitable for semantic segmentation.
- Establish a baseline using SimpleITK adaptive thresholding algorithms
- Use Monai to train variations of UNET, VNET, and SwinUNETR
- Evaluate performance using DICE and IoU metrics as well as explore performance on measuring Cell Confluence.
- Environment_Setup
- Dataset
- Preprocessing
- Baseline
- Deep_Learning
- Results
The environment used is described in detail in the environment.yml file. The project was run on a Windows11 machine with and an RTX3080 GPU with appropriate NVIDIA drivers installed.
Using conda you can create an environment as follows:
conda env create -n livecellseg -f environment.yml
Then activate it as follows:
conda activate livecellseg
Skip if you want to use already downloaded tiny subset
The dataset used is LIVECell. It has many formats and subexperiments. What we require are 4 files to be downloaded:
Visit the original LIVECell repository for updated links if the links above do not work.
The code assumes by default that the dataset is downloaded with the following structure.
<repository>/dataset
.../images
.../livecell_coco_train.json
.../livecell_coco_val.json
.../livecell_coco_test.json
You might notice that the repository already contains the above structure, for convenience, 1% of LIVECell was randomly sampled using sample_subset.py
and has been uploaded for testing the code. If you want to use the whole dataset please replace the contents of the dataset folder with the full versions that you have downloaded from the above links.
We must convert LIVECell instance segmantation labels to semantic segmentation masks for all sets run the following commands
python instance2semantic.py
You should see ./dataset/livecell_train_val_masks
and ./dataset/livecell_test_masks
folders generated with black and white images inside with white representing cells and with black for the background.
To ensure more accurate results refer to my GitHub issue for a list of duplicate images as well as images common between training and validation. It is recommended that you remove those images before proceeding with training.
Once you have the semantic segmentation masks it is easy to skim through the directory and observe outlier masks (Masks only having one cell) and removing them and their corresponding image and annotation.
For the baseline I tested 4 adaptive thresholding algorithms with and without binary morphology using simpleitk.
If you have done the previous steps you can simply run the notebook baseline.ipynb
to obtain results for the baseline. It contains comments and sections to guide you.
If you have not changed the location of the dataset, the notebook should run from start to finish without errors.
For the deep learning models, I have tested different configurations of UNET, VNET, and SwinUNETR. The best trained model (SwinUNETR) is uploaded in models/
.
All steps to reproduce a trained model like the one uploaded is available in the notebook deep_learning.ipynb
. It is set to run for 1 epoch by default just so that you are able to run it to completion on the subset quickly for testing.
The notebook also contains testing and inference code as well as dataset exploration and visualizations.
Again, if you have not changed the location of the dataset, the notebook should run from start to finish without errors.
SCH refers to using a cosine annealing learning rate scheduler.
Best UNET model had the following architecture
# UNet 6L-3R (36.9 MB)
model = UNet(
spatial_dims=2,
in_channels=1,
out_channels=1,
channels=(16, 32, 64, 128, 256, 512),
strides=(2, 2, 2, 2, 2),
num_res_units=3,
norm=Norm.BATCH,
)
optimizer = Adam(model.parameters(), lr=1e-4)
learning_rate_scheduler = None
Architecture | Confluence RMSE | ||
---|---|---|---|
Train | Validation | Test | |
Unet-6L-3R | 7.6% | 7% | 6% |
Unet-6L-0R | … | … | … |
Unet-4L-0R | 7.9% | 7.2% | 6.2% |
Unet-2L-0R | 9.2% | 8.2% | 9.3% |
[1] Edlund, C., Jackson, T.R., Khalid, N. et al. LIVECell—A large-scale dataset for label-free live cell segmentation. Nat Methods 18, 1038–1045 (2021). https://doi.org/10.1038/s41592-021-01249-6 [2] https://arxiv.org/abs/2211.02701