IDC prediction in breast cancer histopathology images using deep residual learning with an accuracy of 99.37% in a subset of images containing a total of 7,500 microscopic images.
The dataset was obtained form kaggle link https://www.kaggle.com/paultimothymooney/predict-idc-in-breast-cancer-histology-images/notebook The dataset actually contained a total of 277,542 images(198.738 IDC negative and 78,786 IDc positive). Out of the total no of images we used a subset of 7,500 images(3,000 IDc positive and 4,500 IDC negative) to avoid generating memory error. The subset of the dataset used by me can be found at https://drive.google.com/file/d/1jv2nnyLxXSKSNGYt2_8Ars3nMHAsc3BT/view?usp=sharing
The image were originally acanned at 40x and hence they had a very low resolutino and for using deep learning algorithms I had to resize the images to an uniform resolutino of(50 X 50) due to very low resolution in the original images. The preprocessinmg was done using openCV resize method without mentaining the aspect ratio.
The weight uploaded in the following repo is the one with 99.377% accuracy trained with model conmtaining 4 residual blocks with no of parameters to be 143,714.
The images were microscopic images hence to improve the feature extractino by deep learning algorithms we used 4 extra channels such as(l* and a* channel of LAB color space and hue and saturation channel of the HSV color space). The channel selection part is present in the 2nd cell of the ipynb file of the model.
The model is based on deep residual learning algorithm to perform extensive feature extraction for aiding to the classification task. The model consists of 4 residual blocks each with a shortcut path to allow residual learning containing convolutional layers with ELU activation. The model can be viewed at 'model.png'.
The model can be trained by running the model.ipynb file after the setting the training parameters resent in the 1st cell under the specifyig parameters section according to your own requirements.
Grad-CAMs visualization has been performed to validate the model's performance along with training and loss curves of the model.
Keras(v2.2.2)
tensorflow(v1.9.0)
keras-vis(v0.4.1)(optional-depends on user)
scikit-learn(v0.19.2)
mlxtend(v0.13.0)