-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from owczr/develop
Preprocessing, Model Builders and Directors, Dataset Loader
- Loading branch information
Showing
86 changed files
with
271,780 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,7 @@ | ||
*.log | ||
test/ | ||
.idea/ | ||
LIDC-IDRI/ | ||
.vscode/ | ||
__pycache__/ | ||
.env |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,18 @@ | ||
# Lung Cancer Detection | ||
|
||
## Table of Contents | ||
- [About](#about) | ||
- [Usage](#usage) | ||
- [License](#license) | ||
|
||
## About | ||
Lung Cancer Detection is a project made for Engineers Thesis "Applications of artificial intellingence in oncology on computer tomography dataset" by **Jakub Owczarek**, under the guidance of Thesis Advisor dr. hab. inz **Mariusz Mlynarczuk** prof. AGH. | ||
<br> | ||
The goal of this projet is to process the [LIDC-IDRI](https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=1966254) dataset and fine-tune deep learining models. | ||
|
||
## Usage | ||
|
||
TODO: Fill in how to use this project locally and on Azure ML | ||
|
||
## License | ||
This project is licensed under the MIT License - see the LICENSE.md file for details |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Annotation Processor | ||
## About | ||
`AnnotationProcessor` class implements the `BaseProcessor` interface. It's purpose is to process the XML annotation files provided by the LIDC-IDRI dataset into a format that will be used for image classification. | ||
|
||
## Format | ||
The format used for image classification is just the z position of the slice. It indicates whether a tumor is present on the image slice or not. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Base Processor | ||
## About | ||
An abstract base class that provides an interface for processing and saving data. It acts as a blueprint for all other processors. | ||
The `BaseProcessor` interface was made to unify the processors in this project. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Dataset Loader | ||
## About | ||
The `DatasetLoader` class was implemented to yield batches of processed dicom images from LIDC-IDRI datasets into Keras models. It implements a `get_dataset` method which returns a `tf.data.Dataset` object by using the `from_generator` method with a custom `_data_generator`. | ||
The `_data_generator` method loads batches of processed dicoms from `.npy` files and yields them. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Dataset Processor | ||
## About | ||
The `DatasetProcessor` implements the `BaseProcessor` interface. It's purpose is to process the whole LIDC-IDRI dataset by using `DicomProcessor` and `AnnotationProcessor` classes. | ||
|
||
## Paralelization | ||
Paralelization is used to speed up the processing. It was implemented by using the `ProcessPoolExecutor` from Pythons built-in `concurrent.futures` library. | ||
|
||
## Train Test Split | ||
After the processing is done user can split the processed directory into train and test subdirectories. | ||
|
||
## Remove Methods | ||
Additionally methods were implemented that can remove the processed directory or the train test split directory. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Dicom Processor | ||
## About | ||
The `DicomProcessor` class implements the `BaseProcessor` interface. It's purpose is to process a single dicom image from the LIDC-IDRI dataset and save it in NumPy format for further usage. | ||
|
||
## Segmentation | ||
Lung segmentation is used in the processing. It was implemented by combining different image processing techniques. The exact segmentation function goes like this: | ||
|
||
1. Select threshold using OTSU algorithm | ||
2. Create a reverse binary mask | ||
3. Remove border | ||
4. Remove small objects | ||
5. Remove small holes | ||
6. Perform binary closing | ||
7. Perform binary opening | ||
8. Include annotations | ||
|
||
In some cases the part of the image with tumor could get lost in this processing and the last step ensures that it is included. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Model Builders | ||
## About | ||
Model builders are classes implementing the base `ModelBuilder` interface which purpose is to build a model with: | ||
|
||
1. Preprocessing layers | ||
2. Base model layers | ||
3. Output layers | ||
|
||
## Base Models | ||
|
||
Base Models used are taken from the [Keras Applications](https://keras.io/api/applications/). The exact models used are described in table below. For more information visit the Keras website and appropriate papers. | ||
|
||
|
||
|Project Name|Keras Model|Parameters|Depth| | ||
|-|-|-|-| | ||
|ConvNeXt|ConvNeXtSmall|50.2M|-| | ||
|DenseNet|DenseNet121|8.1M|242| | ||
|EfficientNetV2|EfficientNetV2B0|7.2M|-| | ||
|EfficientNet|EfficientNetB0|5.3M|132| | ||
|InceptionResNet|InceptionResNetV2|55.9M|449| | ||
|InceptionNet|InceptionNetV3|22.9M|189| | ||
|MobileNet|MobileNetV3Small|2.9M|-| | ||
|NASNet|NASNetMobile|5.3M|389| | ||
|ResNetV2|ResNet50V2|25.6M|103| | ||
|ResNet|ResNet50|25.6M|107| | ||
|VGG|VGG16|138.4M|16| | ||
|Xception|Xception|22.9M|81| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Model Director | ||
## About | ||
The `ModelDirector` class implements the `BaseModelDirector` interface and it's purpose is to define the order of building steps for `ModelBuilder`s and make the final `tf.keras.Model`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Notebooks | ||
Notebooks were used for analysis and development of individual solutions. | ||
|
||
## Annotations | ||
This notebook analyzed annotation xml file. It visualizes a nodule on image and presents a processing method. | ||
|
||
## Diagnosis | ||
This notebook was used to read the `tcia-diagnosis-data-2012-04-20.xls` file. | ||
|
||
## Dicom Viewer | ||
In this notebook we can load and see how an dicom image looks. | ||
|
||
## Dicom | ||
This notebook shows the various dicom tags. | ||
|
||
## Metadata | ||
Metadata notebook analyzed some metadata that came with dataset. | ||
|
||
## Segmentation | ||
This notebook explaines each segmentation step used for creating a lung mask. |
Large diffs are not rendered by default.
Oops, something went wrong.
264,232 changes: 264,232 additions & 0 deletions
264,232
notebooks/analysis/check_preprocessed.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.