- Problem Statement
- Basic Implemented Models
- Improved Models
- Dataset
- Libraries
- Notebook Structure
- Equations
- Loss Function
- Data Augmentation
- Constructing Labels & Dataset Arrays
- Splitting Data
- Encoding & Normalization
- Essential Comparisons
- Graphs
- Optimizer Types
- Fully Connected Layers & Convolution
- Hyper-parameters
- Improved Models Initializations
- Training Loop
- Step Function
- Testing
- Pre-Trained Models Strategies
- Results
- Plots
- References
- Contributors
Given the chest X-ray dataset, our goal is to build a range of neural networks to diagnose chest X-rays. Our database consists of patients suffering from COVID-19, Pneumonia, and normal Patients (3 classes).
- Fully Connected Neural Network Models.
- Many Layers
- Few Layers
- Convolutional Neural Network Models With A Classification Layer At The End.
- Many Layers
- Few Layers
- Famous backbones.
- ResNet 50
- Inception V3
- VGG 16
- Find the used dataset here
- TensorFlow
- Keras
- Numpy
- Sklearn
- PIL
- Matplotlib
├── Loading Images
│ ├── API File
│ └── Extract Images
├── Imports
├── Pathes
├── Initialization & Images Reading
├── Augmentation
│ ├── Covid Augmentation
│ ├── Normal Augmentation
│ └── Viral Augmentation
├── Constructing Labels & Dataset Arrays
├── Splitting and Categorization
│ ├── Split
│ ├── Categorization (Labels' Encodeing )
│ └── Normalization
├── Training loop
│ ├── HyperParameters
│ ├── Step Function
│ └── Main Training Function
├── Testing
│ └── Reporting Model Performance
├── Basic Models
│ ├── CNN Few Layers Model
| | ├── Layers
│ │ ├── With Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
| | ├── Without Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
│ ├── CNN Many Layers Model
| | ├── Layers
│ │ ├── With Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
| | ├── Without Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
│ ├── FCN Many Layers Model
| | ├── Layers
│ │ ├── With Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
| | ├── Without Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
│ └── FCN Few Layers Model
| | ├── Layers
│ │ ├── With Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
| | ├── Without Augmentation
│ │ | ├── Test Accuracy
│ │ | └── Report
└── Pre-Trained Models
├── VGG 16
│ ├── Unweighted
| | └── Test Acurracy
│ ├── Weighted (Freezing Ealier Layers)
| | └── Test Acurracy
| └── Weighted All Layers Is Traniable
| | └── Test Acurracy
├── ResNet 50
│ ├── Unweighted
| | └── Test Acurracy
│ ├── Weighted
| | └── Test Acurracy
└── Inception V3
└── Freezing
└── Test Acurracy
Precision = True Positive / True Positive + False Positive
Recall (or Sensitivity) = True Positive / True Positive + False Negative
F1 Score = 2 ∗ (Precision ∗ Recall / Precision + Recall)
Accuracy = True positive + True Negative / True positive + False Negative + True Negative + False Positive
L(Y,Y^) = −(∑Y∗log (Y^) + (1−Y) ∗ loglog(1−Y^))
Where, Y = true label, Y^ = Predicted Labels, and L(Y,Y^) is loss function.
Data Augmentation is a technique to increase the diversity of your training set by applying random (but realistic) transformations such as image rotation.
data_augmentation = tf.keras.Sequential([
layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
layers.experimental.preprocessing.RandomRotation(0.2),
])
Dataset = np.concatenate((COVID,Normal,Viral))
Y = np.concatenate((Covidlabel,Normallabel,Virallabel))
x_train, x_test, y_train, y_test = train_test_split(Dataset,Y , test_size=0.3,random_state=40, stratify=Y)
x_train, x_val, y_train, y_val= train_test_split(x_train,y_train, test_size=0.2, random_state=1,stratify=y_train)
trainY = to_categorical(trainY, 3)
testY = to_categorical(testY, 3)
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0
Problem Type | Output Type | Final Activation Function | Loss Function |
---|---|---|---|
Regression | Numerical Value | Linear | Mean Squar Error (MSE) |
Classsification | Binary Outcome | Sigmoid | Binary Cross Entropy |
Classsification | Single Label, Multiple Classes | Softmax | Cross Entropy |
Classsification | Multiple Labels, Multiple Classes | Sigmoid | Binary Cross Entropy |
Softmax | Sigmoid |
---|---|
Used for multi-classification in logistic regression model. | Used for binary classification in logistic regression model. |
The probabilities sum will be 1 | The probabilities sum need not be 1. |
Used in the different layers of neural networks. | Used as activation function while building neural networks. |
The high value will have the higher probability than other values. | The high value will have the high probability but not the higher probability. |
CategoricalCrossentropy class | BinaryCrossentropy class |
---|---|
Computes the crossentropy loss between the labels and predictions. | Computes the cross-entropy loss between true labels and predicted labels. |
Use this crossentropy loss function when there are two or more label classes. | Use this cross-entropy loss when there are only two label classes (assumed to be 0 and 1). |
tf.keras.losses.categorical_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0 ) |
tf.keras.losses.binary_crossentropy( y_true, y_pred, from_logits=False, label_smoothing=0 ) |
- Gradient descent is an optimization algorithm that's used when training a machine learning model. It's based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum.
- Momentum is like a ball rolling downhill. The ball will gain momentum as it rolls down the hill.
- Momentum helps accelerate Gradient Descent(GD) when we have surfaces that curve more steeply in one direction than in another direction.
- Nesterov acceleration optimization is like a ball rolling down the hill but knows exactly when to slow down before the gradient of the hill increases again.
- Adagrad is an adaptive learning rate method. In Adagrad we adopt the learning rate to the parameters. We perform larger updates for infrequent parameters and smaller updates for frequent parameters.
- It is well suited when we have sparse data as in large scale neural networks. GloVe word embedding uses adagrad where infrequent words required a greater update and frequent words require smaller updates.
- Adadelta is an extension of Adagrad and it also tries to reduce Adagrad’s aggressive, monotonically reducing the learning rate.
- It does this by restricting the window of the past accumulated gradient to some fixed size of w. Running average at time t then depends on the previous average and the current gradient.
- RMSProp is Root Mean Square Propagation. It was devised by Geoffrey Hinton.
- RMSProp tries to resolve Adagrad’s radically diminishing learning rates by using a moving average of the squared gradient. It utilizes the magnitude of the recent gradient descents to normalize the gradient.
- Another method that calculates the individual adaptive learning rate for each parameter from estimates of first and second moments of the gradients.
- It also reduces the radically diminishing learning rates of Adagrad
- Adam can be viewed as a combination of Adagrad, which works well on sparse gradients and RMSprop which works well in online and nonstationary settings.
- Adam implements the exponential moving average of the gradients to scale the learning rate instead of a simple average as in Adagrad. It keeps an exponentially decaying average of past gradients
- Nadam combines NAG and Adam
- Nadam is employed for noisy gradients or for gradients with high curvatures
- The learning process is accelerated by summing up the exponential decay of the moving averages for the previous and current gradient
EPOCHS = 10
BS = 64
INIT_LR = 1e-3
Optimizer = Adam
Loss_Function = CrossEntropy
- The Number Of Layers Differs From One Model To Another
- Input layers wih shape (256,256,3)and conver
- Fully connected layer with 1024 neurons with activation function ReLU
- Fully connected layer with 50 neurons with activation function ReLU
- Output layer with 3 neurons (same number of labels) with activation function Softmax
- Input layers wih shape (256,256,3)
- Fully connected layer with 1200 neurons with activation function ReLU
- Fdropout layer with .5
- Fully connected layer with 1000 neurons with activation function ReLU
- Dropout layer with .5
- Fully connected layer with 900 neurons with activation function ReLU
- Dropout layer with .5
- Fully connected layer with 800 neurons with activation function ReLU
- Dropout layer with .5
- Fully connected layer with 256 neurons with activation function ReLU
- Dropout layer with .5
- Fully connected layer with 128 neurons with activation function ReLU
- Dropout layer with .5
- Fully connected layer with 64 neurons with activation function ReLU
- Fully connected layer with 32 neurons with activation function ReLU
- Fully connected layer with 16 neurons with activation function ReLU
- Output layer with 3 neurons(same number of labels) with activation function Softmax
- Convolution layer with 32 filter and kerel size (3,3) and ReLU activation function.
- Maxpooling layer with size (2,2).
- Convolution layer with 128 filter and kerel size (3,3) and ReLU activation function.
- Maxpooling layer with size (2,2).
- Reshape input to 1D array.
- Fully connected layer with 600 neurons with activation function ReLU.
- Fully connected layer with 512 neurons with activation function ReLU.
- Dropout layer with .3.
- Output layer with 3 neurons (same number of labels) with activation function Softmax.
- Convolution layer with 32 filter and kerel size (3,3) and ReLU activation function.
- Maxpooling layer with size (2,2).
- Convolution layer with 64 filter and kerel size (3,3) and ReLU activation function.
- Maxpooling layer with size (2,2).
- Convolution layer with 256 filter and kerel size (3,3) and ReLU activation function.
- Maxpooling layer with size (2,2).
- Convolution layer with 512 filter and kerel size (3,3) and ReLU activation function. Maxpooling layer with size (2,2).
- onvolution layer with 700 filter and kerel size (3,3) and ReLU activation function.
- Maxpooling layer with size (2,2).
- Fully connected layer with 256 neurons with activation function ReLU.
- Fully connected layer with 1024 neurons with activation function ReLU.
- Dropout layer with .3.
- Output layer with 3 neurons (same number of labels) with activation function Softmax.
# Inception V3
base_model = InceptionV3(input_shape = (256, 256, 3), include_top = False, weights = 'imagenet') first 200 layer is false(freezing)
# ResNet50
base_model = ResNet50(input_shape = (256, 256, 3),include_top=False, weights='imagenet')
base_model = ResNet50(input_shape = (256, 256, 3),include_top=False, weights='None)
# VGG16
base_model = VGG16(input_shape = (256, 256, 3), include_top = False, weights = 'imagenet')
base_model = VGG16(input_shape = (256, 256, 3), include_top = False, weights = 'imagenet') && all trainable layers is false (freezing)
base_model = VGG16(input_shape = (256, 256, 3), include_top = False, weights = None)
def trainingloop(model,trainX,trainY,numbrofbatches,EPOCHS,optimizer) :
numUpdates = int(trainX.shape[0] / numbrofbatches) #1
epoch_loss_avg=[]
# loop over the number of epochs
for epoch in range(0, EPOCHS):
epoch_loss_avg = tf.keras.metrics.Mean()
epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
accuracy_for_each_epoah=np.array([])
sys.stdout.flush()
# loop over the data in batch size increments
for i in range(0, numUpdates):
# determine starting and ending slice indexes for the current
# batch
start = i * numbrofbatches
end = start + numbrofbatches
# take a step
loss=step(model,trainX[start:end], trainY[start:end],optimizer)#5
epoch_loss_avg.update_state(loss)
def step(model,x, y,optimizer):
# keep track of our gradients
with tf.GradientTape() as tape:
pred = model(x)
loss = categorical_crossentropy(y, pred)
grads = tape.gradient(loss,model.trainable_variables)
optimizer.apply_gradients(zip(grads,model.trainable_variables))
return loss
def testing (modelx,testX,testY):
(loss, acc) = modelx.evaluate(testX, testY)
print("[INFO] test accuracy: {:.4f}".format(acc))
- Train the entire model, you use the architecture of the pre-trained model and train it. you’ll need a large dataset (and a lot of computational power).
- Train some layers and leave the others frozen, as you remember, lower layers refer to general features, while higher layers refer to specific features. Here, we play with that dichotomy by choosing how much we want to adjust the weights of the network (a frozen layer does not change during training).
- Small data set -> leave more layers frozen to avoid overfitting.
- Large dataset -> improve your model by training more layers.
- Fixed feature extraction mechanism, the main idea is to keep the convolutional base in its original form and then use its outputs to feed the classifier.
Models | Layers | Augmentation | Loss | Test Accuracy | F1-Score Class0 | Recall Class0 | Precision Class0 | F1-Score Class1 | Recall Class1 | Precision Class1 | F1-Score Class2 | Recall Class2 | Precision Class2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Fully Connected | Few Layers | Yes | 1.0535 | 77.66% | 79.18% | 88.75% | 71.48% | 84.04% | 95.83% | 74.84% | 62.78% | 46.39% | 97.09% |
Fully Connected | Many Layers | Yes | 0.4197 | 84.15% | 82.02% | 80.14% | 83.99% | 90.60% | 97.78% | 84.41% | 79.32% | 74.86% | 84.35% |
CNN | Few Layers | Yes | 0.8994 | 85.75% | 85.69% | 85.69% | 85.69% | 89.19% | 83.61% | 95.56% | 83.92% | 89.17% | 79.26% |
CNN | Many Layers | Yes | 0.3850 | 90.18% | 89.35% | 81.53% | 94.07% | 90.66% | 97.78% | 84.51% | 87.07% | 86.67% | 88.76% |
Fully Connected | Few Layers | No | 2.3815 | 82.76% | 80.32% | 98.51% | 67.81% | 91.48% | 85.00% | 90.03% | 74.15% | 62.13% | 91.94% |
Fully Connected | Many Layers | No | 0.7163 | 88.78% | 88.03% | 82.34% | 94.87% | 92.51% | 87.50% | 98.13% | 85.87% | 95.54% | 77.98% |
CNN | Few Layers | No | 0.2063 | 94.28% | 93.43% | 97.26% | 89.89% | 96.68% | 96.94% | 96.41% | 92.37% | 88.37% | 96.75% |
CNN | Many Layers | No | 0.2039 | 93.33% | 94.33% | 95.27% | 93.41% | 93.53% | 88.33% | 99.38% | 91.90% | 95.54% | 88.53% |
Models | Layers | Loss | Test Accuracy | F1-Score Class0 | Recall Class0 | Precision Class0 | F1-Score Class1 | Recall Class1 | Precision Class1 | F1-Score Class2 | Recall Class2 | Precision Class2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ResNet 50 | Pre-Trained Weighted | 0.2736 | 97.03% | 96.86% | 96.02% | 97.72% | 98.74% | 98.33% | 99.16% | 96.58% | 97.77% | 95.41% |
ResNet 50 | Random weights | 0.2559 | 94.08% | 93.28% | 91.54% | 95.09% | 97.02% | 96.11% | 98.03% | 92.18% | 94.80% | 98.70% |
VGG 16 | Pre-Trained Weighted | 0.1652 | 95.05% | 95.10% | 96.52% | 93.72% | 85.89% | 93.89% | 97.97% | 92.97% | 93.32% | 92.63% |
VGG 16 | Random weights | 0.2088 | 95.06% | 93.58% | 99.75% | 88.13% | 98.34% | 98.89% | 97.80% | 91.88% | 95.40% | 99.42% |
Inception V3 | Pre-Trained Freezing Layers | 0.1634 | 96.40% | 94.48% | 91.53% | 97.63% | 98.95% | 98.61% | 99.30% | 94.77% | 98.06% | 91.69% |
Hover Over Plots To Check Model Name
Hover Over Plots To Check Model Name
- Deep learning based detection and analysis of COVID-19 on chest X-ray images
- Detailed Guide to Understand and Implement ResNets
Ahmed Samy |
Omar Reda |