One of the most important hyperparameters of deep learning models is the learning rate. It is a tuning parameter in an optimization function that determines the step size (how big or small) at each iteration while moving toward a mininum of a loss function.
We can experiement with different learning rates to find the optimal value where the model has best results. In order to try different learning rates, we should define a function to create a function first, for instance:
# Function to create model
def make_model(learning_rate=0.01):
base_model = Xception(weights='imagenet',
include_top=False,
input_shape=(150,150,3))
base_model.trainable = False
#########################################
inputs = keras.Input(shape=(150,150,3))
base = base_model(inputs, training=False)
vectors = keras.layers.GlobalAveragePooling2D()(base)
outputs = keras.layers.Dense(10)(vectors)
model = keras.Model(inputs, outputs)
#########################################
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
loss = keras.losses.CategoricalCrossentropy(from_logits=True)
# Compile the model
model.compile(optimizer=optimizer,
loss=loss,
metrics=['accuracy'])
return model
Next, we can loop over on the list of learning rates:
# Dictionary to store history with different learning rates
scores = {}
# List of learning rates
lrs = [0.0001, 0.001, 0.01, 0.1]
for lr in lrs:
print(lr)
model = make_model(learning_rate=lr)
history = model.fit(train_ds, epochs=10, validation_data=val_ds)
scores[lr] = history.history
print()
print()
Visualizing the training and validation accuracies help us to determine which learning rate value is the best for for the model. One typical way to determine the best value is by looking at the gap between training and validation accuracy. The smaller gap indicates the optimal value of the learning rate.
Add notes from the video (PRs are welcome)
- learning rate analogy: the speed of reading a book
- reading fast (skimming thus missing details) vs reading slow (not much progress and leaving out books)
- finding the optimal learning rate
The notes are written by the community. If you see an error here, please create a PR with a fix. |