Chest X-Ray Diagnosis

Table Of Content

Problem Statement

Given the chest X-ray dataset, our goal is to build a range of neural networks to diagnose chest X-rays. Our database consists of patients suffering from COVID-19, Pneumonia, and normal Patients (3 classes).

Basic Implemented Models

Fully Connected Neural Network Models.
- Many Layers
- Few Layers
Convolutional Neural Network Models With A Classification Layer At The End.
- Many Layers
- Few Layers

Improved Models

Famous backbones.
- ResNet 50
- Inception V3
- VGG 16

Dataset

Find the used dataset here

Libraries

TensorFlow
Keras
Numpy
Sklearn
PIL
Matplotlib

Notebook Structure

├── Loading Images
│   ├── API File
│   └── Extract Images
├── Imports
├── Pathes
├── Initialization & Images Reading
├── Augmentation
│   ├── Covid Augmentation
│   ├── Normal Augmentation
│   └── Viral Augmentation
├── Constructing Labels & Dataset Arrays
├── Splitting and Categorization
│   ├── Split
│   ├── Categorization (Labels' Encodeing )
│   └── Normalization
├── Training loop
│   ├── HyperParameters
│   ├── Step Function
│   └── Main Training Function
├── Testing
│   └── Reporting Model Performance
├── Basic Models
│   ├── CNN Few Layers Model
|   |   ├── Layers
│   │   ├── With Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
|   |   ├── Without Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
│   ├── CNN Many Layers Model
|   |   ├── Layers
│   │   ├── With Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
|   |   ├── Without Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
│   ├── FCN Many Layers Model
|   |   ├── Layers
│   │   ├── With Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
|   |   ├── Without Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
│   └── FCN Few Layers Model
|   |   ├── Layers
│   │   ├── With Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
|   |   ├── Without Augmentation
│   │   |   ├── Test Accuracy
│   │   |   └── Report
└── Pre-Trained Models
    ├── VGG 16
    │   ├── Unweighted
    |   |   └── Test Acurracy  
    │   ├── Weighted (Freezing Ealier Layers)
    |   |   └── Test Acurracy  
    |   └── Weighted  All Layers  Is Traniable
    |   |   └── Test Acurracy
    ├── ResNet 50
    │   ├── Unweighted
    |   |   └── Test Acurracy  
    │   ├── Weighted 
    |   |   └── Test Acurracy  
    └── Inception V3
        └── Freezing
            └── Test Acurracy

Equations

Precision = True Positive / True Positive + False Positive
Recall (or Sensitivity) = True Positive / True Positive + False Negative
F1 Score = 2 ∗ (Precision ∗ Recall / Precision + Recall)
Accuracy = True positive + True Negative / True positive + False Negative + True Negative + False Positive

Loss Function

L(Y,Y^) = −(∑Y∗log (Y^) + (1−Y) ∗ loglog(1−Y^))
Where, Y = true label, Y^ = Predicted Labels, and L(Y,Y^) is loss function.

Data Augmentation

Data Augmentation is a technique to increase the diversity of your training set by applying random (but realistic) transformations such as image rotation.

data_augmentation = tf.keras.Sequential([
  layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"), 
  layers.experimental.preprocessing.RandomRotation(0.2),
])

Constructing Labels & Dataset Arrays

Dataset = np.concatenate((COVID,Normal,Viral))
Y = np.concatenate((Covidlabel,Normallabel,Virallabel))

Splitting Data

x_train, x_test, y_train, y_test = train_test_split(Dataset,Y , test_size=0.3,random_state=40, stratify=Y)
x_train, x_val, y_train, y_val= train_test_split(x_train,y_train, test_size=0.2, random_state=1,stratify=y_train)

Encoding & Normalization

trainY = to_categorical(trainY, 3)
testY = to_categorical(testY, 3)
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

Essential Comparisons

Problem Type	Output Type	Final Activation Function	Loss Function
Regression	Numerical Value	Linear	Mean Squar Error (MSE)
Classsification	Binary Outcome	Sigmoid	Binary Cross Entropy
Classsification	Single Label, Multiple Classes	Softmax	Cross Entropy
Classsification	Multiple Labels, Multiple Classes	Sigmoid	Binary Cross Entropy

Softmax	Sigmoid
Used for multi-classification in logistic regression model.	Used for binary classification in logistic regression model.
The probabilities sum will be 1	The probabilities sum need not be 1.
Used in the different layers of neural networks.	Used as activation function while building neural networks.
The high value will have the higher probability than other values.	The high value will have the high probability but not the higher probability.

CategoricalCrossentropy class	BinaryCrossentropy class
Computes the crossentropy loss between the labels and predictions.	Computes the cross-entropy loss between true labels and predicted labels.
Use this crossentropy loss function when there are two or more label classes.	Use this cross-entropy loss when there are only two label classes (assumed to be 0 and 1).
`tf.keras.losses.categorical_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0 )`	`tf.keras.losses.binary_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0 )`

Graphs

Optimizers Types

Gradient Descent

Gradient descent is an optimization algorithm that's used when training a machine learning model. It's based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum.

Momentum

Momentum is like a ball rolling downhill. The ball will gain momentum as it rolls down the hill.
Momentum helps accelerate Gradient Descent(GD) when we have surfaces that curve more steeply in one direction than in another direction.

Nesterov accelerated gradient(NAG)

Nesterov acceleration optimization is like a ball rolling down the hill but knows exactly when to slow down before the gradient of the hill increases again.

Adagrad — Adaptive Gradient Algorithm

Adagrad is an adaptive learning rate method. In Adagrad we adopt the learning rate to the parameters. We perform larger updates for infrequent parameters and smaller updates for frequent parameters.
It is well suited when we have sparse data as in large scale neural networks. GloVe word embedding uses adagrad where infrequent words required a greater update and frequent words require smaller updates.

Adadelta

Adadelta is an extension of Adagrad and it also tries to reduce Adagrad’s aggressive, monotonically reducing the learning rate.
It does this by restricting the window of the past accumulated gradient to some fixed size of w. Running average at time t then depends on the previous average and the current gradient.

RMSProp

RMSProp is Root Mean Square Propagation. It was devised by Geoffrey Hinton.
RMSProp tries to resolve Adagrad’s radically diminishing learning rates by using a moving average of the squared gradient. It utilizes the magnitude of the recent gradient descents to normalize the gradient.

Adam — Adaptive Moment Estimation

Another method that calculates the individual adaptive learning rate for each parameter from estimates of first and second moments of the gradients.
It also reduces the radically diminishing learning rates of Adagrad
Adam can be viewed as a combination of Adagrad, which works well on sparse gradients and RMSprop which works well in online and nonstationary settings.
Adam implements the exponential moving average of the gradients to scale the learning rate instead of a simple average as in Adagrad. It keeps an exponentially decaying average of past gradients

Nadam- Nesterov-accelerated Adaptive Moment Estimation

Nadam combines NAG and Adam
Nadam is employed for noisy gradients or for gradients with high curvatures
The learning process is accelerated by summing up the exponential decay of the moving averages for the previous and current gradient

Hyper-parameters

EPOCHS =  10                                                               
BS = 64                                                                         
INIT_LR = 1e-3
Optimizer = Adam
Loss_Function = CrossEntropy

The Number Of Layers Differs From One Model To Another

Fully Connected Layers & Convolution

Fully connected layers model structure

few layers

Input layers wih shape (256,256,3)and conver
Fully connected layer with 1024 neurons with activation function ReLU
Fully connected layer with 50 neurons with activation function ReLU
Output layer with 3 neurons (same number of labels) with activation function Softmax

Many layer

Input layers wih shape (256,256,3)
Fully connected layer with 1200 neurons with activation function ReLU
Fdropout layer with .5
Fully connected layer with 1000 neurons with activation function ReLU
Dropout layer with .5
Fully connected layer with 900 neurons with activation function ReLU
Dropout layer with .5
Fully connected layer with 800 neurons with activation function ReLU
Dropout layer with .5
Fully connected layer with 256 neurons with activation function ReLU
Dropout layer with .5
Fully connected layer with 128 neurons with activation function ReLU
Dropout layer with .5
Fully connected layer with 64 neurons with activation function ReLU
Fully connected layer with 32 neurons with activation function ReLU
Fully connected layer with 16 neurons with activation function ReLU
Output layer with 3 neurons(same number of labels) with activation function Softmax

CCN layers model structure

Few CNN Layers

Convolution layer with 32 filter and kerel size (3,3) and ReLU activation function.
Maxpooling layer with size (2,2).
Convolution layer with 128 filter and kerel size (3,3) and ReLU activation function.
Maxpooling layer with size (2,2).
Reshape input to 1D array.
Fully connected layer with 600 neurons with activation function ReLU.
Fully connected layer with 512 neurons with activation function ReLU.
Dropout layer with .3.
Output layer with 3 neurons (same number of labels) with activation function Softmax.

Many CNN Layers

Convolution layer with 32 filter and kerel size (3,3) and ReLU activation function.
Maxpooling layer with size (2,2).
Convolution layer with 64 filter and kerel size (3,3) and ReLU activation function.
Maxpooling layer with size (2,2).
Convolution layer with 256 filter and kerel size (3,3) and ReLU activation function.
Maxpooling layer with size (2,2).
Convolution layer with 512 filter and kerel size (3,3) and ReLU activation function. Maxpooling layer with size (2,2).
onvolution layer with 700 filter and kerel size (3,3) and ReLU activation function.
Maxpooling layer with size (2,2).
Fully connected layer with 256 neurons with activation function ReLU.
Fully connected layer with 1024 neurons with activation function ReLU.
Dropout layer with .3.
Output layer with 3 neurons (same number of labels) with activation function Softmax.

Improved Models Initializations (Pre-Trained)

# Inception V3 
base_model = InceptionV3(input_shape = (256, 256, 3), include_top = False, weights = 'imagenet') first  200 layer is false(freezing)
  
# ResNet50
base_model = ResNet50(input_shape = (256, 256, 3),include_top=False, weights='imagenet')  
base_model = ResNet50(input_shape = (256, 256, 3),include_top=False, weights='None)  

# VGG16
base_model = VGG16(input_shape = (256, 256, 3), include_top = False, weights = 'imagenet')
base_model = VGG16(input_shape = (256, 256, 3), include_top = False, weights = 'imagenet') && all trainable layers is false (freezing)
base_model = VGG16(input_shape = (256, 256, 3), include_top = False, weights = None)

Training Loop

def trainingloop(model,trainX,trainY,numbrofbatches,EPOCHS,optimizer) :
  numUpdates = int(trainX.shape[0] / numbrofbatches) #1
  epoch_loss_avg=[]
  
  # loop over the number of epochs
  for epoch in range(0, EPOCHS):
    epoch_loss_avg = tf.keras.metrics.Mean()
    epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
    accuracy_for_each_epoah=np.array([])
    sys.stdout.flush()
  # loop over the data in batch size increments
  for i in range(0, numUpdates):
    # determine starting and ending slice indexes for the current
    # batch
    start = i * numbrofbatches
    end = start + numbrofbatches
  # take a step
  loss=step(model,trainX[start:end], trainY[start:end],optimizer)#5
  epoch_loss_avg.update_state(loss)

Step Function

def step(model,x, y,optimizer):
# keep track of our gradients
  with tf.GradientTape() as tape:
    pred = model(x)
    loss = categorical_crossentropy(y, pred)
  grads = tape.gradient(loss,model.trainable_variables)
  optimizer.apply_gradients(zip(grads,model.trainable_variables))
  return loss

Testing

def testing (modelx,testX,testY):       
  (loss, acc) =  modelx.evaluate(testX, testY)
  print("[INFO] test accuracy: {:.4f}".format(acc))

Pre-Trained Models Strategies

Train the entire model, you use the architecture of the pre-trained model and train it. you’ll need a large dataset (and a lot of computational power).
Train some layers and leave the others frozen, as you remember, lower layers refer to general features, while higher layers refer to specific features. Here, we play with that dichotomy by choosing how much we want to adjust the weights of the network (a frozen layer does not change during training).

Small data set -> leave more layers frozen to avoid overfitting.
Large dataset -> improve your model by training more layers.

Fixed feature extraction mechanism, the main idea is to keep the convolutional base in its original form and then use its outputs to feed the classifier.

Results

Basic Models Results

Models	Layers	Augmentation	Loss	Test Accuracy	F1-Score Class0	Recall Class0	Precision Class0	F1-Score Class1	Recall Class1	Precision Class1	F1-Score Class2	Recall Class2	Precision Class2
Fully Connected	Few Layers	Yes	1.0535	77.66%	79.18%	88.75%	71.48%	84.04%	95.83%	74.84%	62.78%	46.39%	97.09%
Fully Connected	Many Layers	Yes	0.4197	84.15%	82.02%	80.14%	83.99%	90.60%	97.78%	84.41%	79.32%	74.86%	84.35%
CNN	Few Layers	Yes	0.8994	85.75%	85.69%	85.69%	85.69%	89.19%	83.61%	95.56%	83.92%	89.17%	79.26%
CNN	Many Layers	Yes	0.3850	90.18%	89.35%	81.53%	94.07%	90.66%	97.78%	84.51%	87.07%	86.67%	88.76%
Fully Connected	Few Layers	No	2.3815	82.76%	80.32%	98.51%	67.81%	91.48%	85.00%	90.03%	74.15%	62.13%	91.94%
Fully Connected	Many Layers	No	0.7163	88.78%	88.03%	82.34%	94.87%	92.51%	87.50%	98.13%	85.87%	95.54%	77.98%
CNN	Few Layers	No	0.2063	94.28%	93.43%	97.26%	89.89%	96.68%	96.94%	96.41%	92.37%	88.37%	96.75%
CNN	Many Layers	No	0.2039	93.33%	94.33%	95.27%	93.41%	93.53%	88.33%	99.38%	91.90%	95.54%	88.53%

Improved Models Results

Models	Layers	Loss	Test Accuracy	F1-Score Class0	Recall Class0	Precision Class0	F1-Score Class1	Recall Class1	Precision Class1	F1-Score Class2	Recall Class2	Precision Class2
ResNet 50	Pre-Trained Weighted	0.2736	97.03%	96.86%	96.02%	97.72%	98.74%	98.33%	99.16%	96.58%	97.77%	95.41%
ResNet 50	Random weights	0.2559	94.08%	93.28%	91.54%	95.09%	97.02%	96.11%	98.03%	92.18%	94.80%	98.70%
VGG 16	Pre-Trained Weighted	0.1652	95.05%	95.10%	96.52%	93.72%	85.89%	93.89%	97.97%	92.97%	93.32%	92.63%
VGG 16	Random weights	0.2088	95.06%	93.58%	99.75%	88.13%	98.34%	98.89%	97.80%	91.88%	95.40%	99.42%
Inception V3	Pre-Trained Freezing Layers	0.1634	96.40%	94.48%	91.53%	97.63%	98.95%	98.61%	99.30%	94.77%	98.06%	91.69%

Plots

Basic Models Plots

Hover Over Plots To Check Model Name

Improved Models Plots