Skip to content

Commit

Permalink
Merge pull request #1 from owczr/develop
Browse files Browse the repository at this point in the history
Preprocessing, Model Builders and Directors, Dataset Loader
  • Loading branch information
owczr authored Dec 28, 2023
2 parents 587f6b6 + 88295b2 commit cc81dbe
Show file tree
Hide file tree
Showing 86 changed files with 271,780 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
*.log
test/
.idea/
LIDC-IDRI/
.vscode/
__pycache__/
.env
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,18 @@
# Lung Cancer Detection

## Table of Contents
- [About](#about)
- [Usage](#usage)
- [License](#license)

## About
Lung Cancer Detection is a project made for Engineers Thesis "Applications of artificial intellingence in oncology on computer tomography dataset" by **Jakub Owczarek**, under the guidance of Thesis Advisor dr. hab. inz **Mariusz Mlynarczuk** prof. AGH.
<br>
The goal of this projet is to process the [LIDC-IDRI](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254) dataset and fine-tune deep learining models.

## Usage

TODO: Fill in how to use this project locally and on Azure ML

## License
This project is licensed under the MIT License - see the LICENSE.md file for details
6 changes: 6 additions & 0 deletions docs/annotation_processor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Annotation Processor
## About
`AnnotationProcessor` class implements the `BaseProcessor` interface. It's purpose is to process the XML annotation files provided by the LIDC-IDRI dataset into a format that will be used for image classification.

## Format
The format used for image classification is just the z position of the slice. It indicates whether a tumor is present on the image slice or not.
4 changes: 4 additions & 0 deletions docs/base_processor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Base Processor
## About
An abstract base class that provides an interface for processing and saving data. It acts as a blueprint for all other processors.
The `BaseProcessor` interface was made to unify the processors in this project.
4 changes: 4 additions & 0 deletions docs/dataset_loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Dataset Loader
## About
The `DatasetLoader` class was implemented to yield batches of processed dicom images from LIDC-IDRI datasets into Keras models. It implements a `get_dataset` method which returns a `tf.data.Dataset` object by using the `from_generator` method with a custom `_data_generator`.
The `_data_generator` method loads batches of processed dicoms from `.npy` files and yields them.
12 changes: 12 additions & 0 deletions docs/dataset_processor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Dataset Processor
## About
The `DatasetProcessor` implements the `BaseProcessor` interface. It's purpose is to process the whole LIDC-IDRI dataset by using `DicomProcessor` and `AnnotationProcessor` classes.

## Paralelization
Paralelization is used to speed up the processing. It was implemented by using the `ProcessPoolExecutor` from Pythons built-in `concurrent.futures` library.

## Train Test Split
After the processing is done user can split the processed directory into train and test subdirectories.

## Remove Methods
Additionally methods were implemented that can remove the processed directory or the train test split directory.
17 changes: 17 additions & 0 deletions docs/dicom_processor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Dicom Processor
## About
The `DicomProcessor` class implements the `BaseProcessor` interface. It's purpose is to process a single dicom image from the LIDC-IDRI dataset and save it in NumPy format for further usage.

## Segmentation
Lung segmentation is used in the processing. It was implemented by combining different image processing techniques. The exact segmentation function goes like this:

1. Select threshold using OTSU algorithm
2. Create a reverse binary mask
3. Remove border
4. Remove small objects
5. Remove small holes
6. Perform binary closing
7. Perform binary opening
8. Include annotations

In some cases the part of the image with tumor could get lost in this processing and the last step ensures that it is included.
27 changes: 27 additions & 0 deletions docs/model_builders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Model Builders
## About
Model builders are classes implementing the base `ModelBuilder` interface which purpose is to build a model with:

1. Preprocessing layers
2. Base model layers
3. Output layers

## Base Models

Base Models used are taken from the [Keras Applications](https://keras.io/api/applications/). The exact models used are described in table below. For more information visit the Keras website and appropriate papers.


|Project Name|Keras Model|Parameters|Depth|
|-|-|-|-|
|ConvNeXt|ConvNeXtSmall|50.2M|-|
|DenseNet|DenseNet121|8.1M|242|
|EfficientNetV2|EfficientNetV2B0|7.2M|-|
|EfficientNet|EfficientNetB0|5.3M|132|
|InceptionResNet|InceptionResNetV2|55.9M|449|
|InceptionNet|InceptionNetV3|22.9M|189|
|MobileNet|MobileNetV3Small|2.9M|-|
|NASNet|NASNetMobile|5.3M|389|
|ResNetV2|ResNet50V2|25.6M|103|
|ResNet|ResNet50|25.6M|107|
|VGG|VGG16|138.4M|16|
|Xception|Xception|22.9M|81|
3 changes: 3 additions & 0 deletions docs/model_director.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Model Director
## About
The `ModelDirector` class implements the `BaseModelDirector` interface and it's purpose is to define the order of building steps for `ModelBuilder`s and make the final `tf.keras.Model`.
20 changes: 20 additions & 0 deletions docs/notebooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Notebooks
Notebooks were used for analysis and development of individual solutions.

## Annotations
This notebook analyzed annotation xml file. It visualizes a nodule on image and presents a processing method.

## Diagnosis
This notebook was used to read the `tcia-diagnosis-data-2012-04-20.xls` file.

## Dicom Viewer
In this notebook we can load and see how an dicom image looks.

## Dicom
This notebook shows the various dicom tags.

## Metadata
Metadata notebook analyzed some metadata that came with dataset.

## Segmentation
This notebook explaines each segmentation step used for creating a lung mask.
307 changes: 307 additions & 0 deletions notebooks/analysis/annotations.ipynb

Large diffs are not rendered by default.

264,232 changes: 264,232 additions & 0 deletions notebooks/analysis/check_preprocessed.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit cc81dbe

Please sign in to comment.