Applied AI in biomedicine

Final project for the Applied AI in biomedicine course.

Course held @ Politecnico di Milano
Acadamic year 2022 - 2023

Introduction to the problem

In this project, we are required to develop a classifier able to detect and distinguish signs of pneumonia and tuberculosis from chest x-ray images.

Dependencies

In this project, we used the following packages:

tensorflow
keras
open_cv
keras_cv
scikit-learn
pandas
numpy
PIL

Important: keras_cv requires tensorflow v2.9+

Data

The provided dataset is composed by 15470 CXR images labeled with N (no findings), P (Pneumonia) and T (tuberculosis) with size 400x400 distributed as follows:

To increase the quality of the images, we use CLAHE method to increase the contrast and Gaussian blur to reduce the noise.

Methods

Deep-learning methods based on convolutional neural networks (CNNs) have exhibited increasing potential and efficiency in image recognition tasks, for this reason, we implement and compare different CNN-based architectures. The notebooks where these models are trained can be found in the code folder. Finally we use grad-CAM and occlusion techniques to get explainations from our models.

Evaluation

Due to the high imbalance between classes, accuracy can not be considered as a good metric. More interesting are Precision, F1-score and Recall.
Our best model reaches the following performances on the test set:

Metrics	No findings	Pneumonia	Tuberculosis
Precision	0.972	0.978	0.943
Recall	0.980	0.985	0.887
F1-score	0.976	0.982	0.914

Results

Given the table above, it is clear that the model behaves pretty well in detecting Pneumonia, whereas, it struggles to identify Tuberculosis, more precisly, given that its recall is low and the precision is high, it means that it is not able to detect all the tuberculosis cases, but when it does, the prediction is almost always correct, thus, it confuses T with N but the contrary is not true.
Below we provide some examples of explainability through grad-CAM of Tuberculosis images.

Limitations

We trained our models on Colab platform, providing us with nvidia tesla k80 gpu (24GB VRAM) and 12GB of RAM. Due to the size of images and the memory consumption of the models at training time, we easily run out of memory, thus, for our best models we couldn't afford a batch size greater than 32.
This implies one epoch took us 470s on average. VRAM is not the only limitation, as matter of fact, we tried to optimize the data pipeline by caching all the images on RAM, so that the dataset iterator does not need to read images from disk, nevertheless, RAM memory was not enough, avoiding us performing this optimization.

Given this hardware limitations, we could not deeply explore the hyperparameters space and use cross validation to get more robust results.

Authors

Name	Surname	github
Sofia	Martellozzo	link
Vlad Marian	Cimpeanu	link
Federico	Caspani	link

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
code		code
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applied AI in biomedicine

Table of contents

Introduction to the problem

Dependencies

Data

Methods

Evaluation

Results

Limitations

Authors

About

Releases

Packages

Languages

VladMarianCimpeanu/applied_AI_in_biomedicine

Folders and files

Latest commit

History

Repository files navigation

Applied AI in biomedicine

Table of contents

Introduction to the problem

Dependencies

Data

Methods

Evaluation

Results

Limitations

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages