Skip to content

Latest commit

 

History

History
60 lines (42 loc) · 2.83 KB

10-augmentation.md

File metadata and controls

60 lines (42 loc) · 2.83 KB

8.10 Data augmentation

In this video, I had a typo/bug: instead of using val_gen for generating images for validation, I used train_gen. That's why adding augmentations didn't help in the video.

Slides

Data augmentation is a process of artifically increasing the amount of data by generating new images from existing images. This includes adding minor alterations to images by flipping, cropping, adding brightness and/or contrast, and many more.

Keras ImageDataGenerator class has many parameters for data augmentation that we can use for generating data. Important thing to remember that the data augmentation should only be implemented on train data, not the validation. Here's how we can generate augmented data for training the model:

# Create image generator for train data and also augment the images
train_gen = ImageDataGenerator(preprocessing_function=preprocess_input,
                               rotation_range=30,
                               width_shift_range=10.0,
                               height_shift_range=10.0,
                               shear_range=10,
                               zoom_range=0.1,
                               vertical_flip=True)

train_ds = train_gen.flow_from_directory(directory=train_imgs_dir,
                                         target_size=(150,150),
                                         batch_size=32)

How to choose augmentations?

  • First step is to use our own judgement, for example, looking at the images (both on train and validation), does it make sense to introduce horizontal flip?
  • Look at the dataset, what kind of vairations are there? are objects always center?
  • Augmentations are hyperparameters: like many other hyperparameters, often times we need to test whether image augmentations are useful for the model or not. If the model doesn't improve or have same performance after certain epochs (let's say 20), in that case we don't use it.

Usually augmented data required training for longer.

Notes

Add notes from the video (PRs are welcome)

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation