- Zero to Mastery Deep Learning with TensorFlow course
- https://github.com/mrdbourke/tensorflow-deep-learning
- CNN Explainer
- Neural Network Playground
- Tensorflow Hub
No. | Notebook Dir | Key Summary | Data |
---|---|---|---|
01 | TF Regression | Typical Acrhitecture for Regression | - |
02 | TF Classification | Typical Acrhitecture for Classification | - |
03 | TF CNN | Typical Acrhitecture for CNN | 10_food_classes_all_data |
04 | TF Transfer Learning : Feature Extraction | Transfer Learning Feature Extraction | |
05 | TF Transfer Learning : Fine Tuning | Transfer Learning Fine Tuning | 10_food_classes_all_data |
06 | TF Transfer Learning : Scaling Up | Transfer Learning Scaling Up | 101_food_classes_all_data |
07 | Natural Language Processing | Natural Language Processing Techniques |
Hyperparameter | Typical value |
---|---|
Input layer shape | Same shape as number of features (e.g. 3 for # bedrooms, # bathrooms, # car spaces in housing price prediction) |
Hidden layer(s) | Problem specific, minimum = 1, maximum = unlimited |
Neurons per hidden layer | Problem specific, generally 10 to 100 |
Output layer shape | Same shape as desired prediction shape (e.g. 1 for house price) |
Hidden activation | Usually ReLU (rectified linear unit) |
Output activation | None, ReLU, logistic/tanh |
Loss function | MSE (mean square error) or MAE (mean absolute error)/Huber (combination of MAE/MSE) if outliers |
Optimizer | SGD (stochastic gradient descent), Adam |
Table 1: Typical architecture of a regression network. Source: Adapted from page 293 of Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow Book by Aurélien Géron
Hyperparameter | Binary Classification | Multiclass classification |
---|---|---|
Input layer shape | Same as number of features (e.g. 5 for age, sex, height, weight, smoking status in heart disease prediction) | Same as binary classification |
Hidden layer(s) | Problem specific, minimum = 1, maximum = unlimited | Same as binary classification |
Neurons per hidden layer | Problem specific, generally 10 to 100 | Same as binary classification |
Output layer shape | 1 (one class or the other) | 1 per class (e.g. 3 for food, person or dog photo) |
Hidden activation | Usually ReLU (rectified linear unit) | Same as binary classification |
Output activation | Sigmoid | Softmax |
Loss function | Cross entropy (tf.keras.losses.BinaryCrossentropy in TensorFlow) |
Cross entropy (tf.keras.losses.CategoricalCrossentropy in TensorFlow) |
Optimizer | SGD (stochastic gradient descent), Adam | Same as binary classification |
Table 1: Typical architecture of a classification network. Source: Adapted from page 295 of Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow Book by Aurélien Géron
Hyperparameter/Layer type | What does it do? | Typical values |
---|---|---|
Input image(s) | Target images you'd like to discover patterns in | Whatever you can take a photo (or video) of |
Input layer | Takes in target images and preprocesses them for further layers | input_shape = [batch_size, image_height, image_width, color_channels] |
Convolution layer | Extracts/learns the most important features from target images | Multiple, can create with tf.keras.layers.ConvXD (X can be multiple values) |
Hidden activation | Adds non-linearity to learned features (non-straight lines) | Usually ReLU (tf.keras.activations.relu ) |
Pooling layer | Reduces the dimensionality of learned image features | Average (tf.keras.layers.AvgPool2D ) or Max (tf.keras.layers.MaxPool2D ) |
Fully connected layer | Further refines learned features from convolution layers | tf.keras.layers.Dense |
Output layer | Takes learned features and outputs them in shape of target labels | output_shape = [number_of_classes] (e.g. 3 for pizza, steak or sushi) |
Output activation | Adds non-linearities to output layer | tf.keras.activations.sigmoid (binary classification) or tf.keras.activations.softmax |
Hyperparameter Name | Description | Typical Values |
---|---|---|
Filter | How many filters should pass over an input tensors | 10,32,64,128 (higher value , more complex |
Kernel Size(filter size) | Shape of the filter over the output | 3,5,7, lower value = smaller features. Higher value = larger features |
Padding | Pad the target sensor with 0s at the border(if 'same') to preserve input shape. Or leaves in the target sensor(if 'valid') , lowering output shape | 'same or 'valid' |
Strides | No. of steps a filter takes across an image at a time(if stride = 1 , a filter moves across an image 1 pixel at a time | 1(default) ,2 |
-
Transfer learning is when you take a pretrained model as it is and apply it to your task without any changes.
- For example, many computer vision models are pretrained on the ImageNet dataset which contains 1000 different classes of images. This means passing a single image to this model will produce 1000 different prediction probability values (1 for each class).
- This is helpful if you have 1000 classes of image you'd like to classify and they're all the same as the ImageNet classes, however, it's not helpful if you want to classify only a small subset of classes (such as 10 different kinds of food). Model's with
"/classification"
in their name on TensorFlow Hub provide this kind of functionality.
-
Feature extraction transfer learning is when you take the underlying patterns (also called weights) a pretrained model has learned and adjust its outputs to be more suited to your problem.
- For example, say the pretrained model you were using had 236 different layers (EfficientNetB0 has 236 layers), but the top layer outputs 1000 classes because it was pretrained on ImageNet. To adjust this to your own problem, you might remove the original activation layer and replace it with your own but with the right number of output classes.
- The important part here is that only the top few layers become trainable, the rest remain frozen. This way all the underlying patterns remain in the rest of the layers and you can utilise them for your own problem. This kind of transfer learning is very helpful when your data is similar to the data a model has been pretrained on.
-
Fine-tuning transfer learning is when you take the underlying patterns (also called weights) of a pretrained model and adjust (fine-tune) them to your own problem.
- This usually means training some, many or all of the layers in the pretrained model. This is useful when you've got a large dataset (e.g. 100+ images per class) where your data is slightly different to the data the original model was trained on.
- A common workflow is to "freeze" all of the learned patterns in the bottom layers of a pretrained model so they're untrainable. And then train the top 2-3 layers of so the pretrained model can adjust its outputs to your custom data (feature extraction).
- After you've trained the top 2-3 layers, you can then gradually "unfreeze" more and more layers and run the training process on your own data to further fine-tune the pretrained model.