CUB 200 Image Classification

In this tutorial we are going to train a classification model using the CUB-200-2011 dataset. This dataset contains 200 species of birds, each with roughly 30 training images and 30 testing images, and has become a staple for testing new ideas for fine-grained visual classification.

A few research papers claim that they can get over 80% accuracy on this dataset using only the images, no bounding boxes or parts are needed. Jaderberg et al. claim that they can acheive 82.3% with the Inception-V2 architecture and Krause et al. claim that they can get 84.4% accuracy with the Inception-V3 architecture. In this tutorial we'll use the Inception-V3 architecture and see if we can reproduce the 84.4% accuracy.

Download the dataset

You can find the dataset website here. The dataset files are relatively small (about 1.3 GB when untared) and should easily fit on your machine.

$ wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
$ tar -xzf CUB_200_2011.tgz

Create the tfrecord files

We will use the tfrecords repo to create the tfrecord files that we can use to train and test the model. You'll need to clone that repo:

$ cd ~/code
$ git clone https://github.com/visipedia/tfrecords.git

Before we can call the create() method in create_tfrecords.py we will need to format the CUB data. You can find a script for doing this formatting here. Fire up an ipython terminal and %cpaste that script into the terminal. Now we can format the CUB dataset:

# make sure you have %cpasted the script

# Change these paths to match the location of the CUB dataset on your machine 
cub_dataset_dir = "/media/drive2/datasets/CUB_200_2011"
cub_image_dir = "/media/drive2/datasets/CUB_200_2011/images"

# we need to create a file containing the size of each image in the dataset. 
# you only need to do this once. scipy is required for this method. 
# Alternatively, you can create this file yourself. 
# Each line of the file should have <image_id> <width> <height>
create_image_sizes_file(cub_dataset_dir, cub_image_dir)

# Now we can create the datasets
train, test = format_dataset(cub_dataset_dir, cub_image_dir)
train, val = create_validation_split(train, fraction_per_class=0.1, shuffle=True)

We have created three arrays holding train, validation and test data. The CUB-200 dataset does not come with a standard validation set, so we took 10% of the train data and created a validation set. The number of elements in each array should be:

Number of train images: 5394
Number of validation images: 600
Number of test images: 5794

We can now pass these arrays to the create() method:

from create_tfrecords import create

# Change this path
dataset_dir = "/media/drive2/tensorflow_datasets/cub/with_600_val_split/"

train_errors = create(dataset=train, dataset_name="train", output_directory=dataset_dir,
                      num_shards=10, num_threads=2, shuffle=True)

val_errors = create(dataset=val, dataset_name="val", output_directory=dataset_dir,
                    num_shards=4, num_threads=2, shuffle=True)

test_errors = create(dataset=test, dataset_name="test", output_directory=dataset_dir,
                     num_shards=10, num_threads=2, shuffle=True)

We now have a dataset directory containing tfrecord files prefixed with either train, val or test that we can use to train and test a model.

I'll assume that the path to the dataset directory is stored in the DATASET_DIR environment variable for the rest of the tutorial. For example:

$ export DATASET_DIR=/media/drive2/tensorflow_datasets/cub/with_600_val_split

Experiment Directory

We'll store all the experiment files in a directory called cub_image_experiment/. Create the following directory structure:

cub_image_experiment/
- logdir/
  - val_summaries/
  - test_summaries/
  - finetune/
    - val_summaries/

I'll assume that the path to cub_image_experiment is stored in the EXPERIMENT_DIR environment variable for the rest of the tutorial. For example:

$ export EXPERIMENT_DIR=/media/drive2/tensorflow_experiments/ebird/cub_image_experiment

Configuration Files

We'll need two configuration files:

config_train.yaml: This will contain all the configurations for image augmentation, the optimizer, the learning rate, model regularization, snapshotting the model and a few other things.
config_test.yaml: This will contain only the necessary configurations to test a model. It is essentially a subset of the training configurations with the image augmentations turned off.

The train configuration file can be found here, and the test configuration file can be found here. You should copy these two configuration files and put them into the cub_image_experiment directory. I have renamed them config_train.yaml and config_test.yaml respectively. A few important parameters are the following:

Config Name	Value	Description
NUM_CLASSES	200	The number of classes (i.e. bird species) in the CUB dataset.
NUM_TRAIN_EXAMPLES	5394	The number of training images in our training tfrecords. We need to know this number so that we can calculate the size of an epoch, which is the number of iterations (i.e. batches) it takes to go through all of the training data.
NUM_TRAIN_ITERATIONS	24000	The maximum number of iterations to execute before stopping. This more of a convenience parameter than anything and allows us to stop the training at specific point.
BATCH_SIZE	32	The number of images to process in one iteration. If you find that your machine is running out of memory then you can make this value smaller (e.g. 16 or 8, etc.).
MODEL_NAME	inception_v3	We are going to use the Inception V3 architecture as our model.

I also like to create a file called cmds.txt where I compose the commands for running the different scripts. We'll add commands to this file as we go. Here is an example cmds.txt file.

Your experiment directory should now look like:

cub_image_experiment/
- logdir/
  - val_summaries/
  - test_summaries/
  - finetune/
    - val_summaries/
- cmds.txt
- config_train.yaml
- config_test.yaml

Data visualization

It is always a good idea to visualize the inputs to the network before starting the training process. This helps identify problems with the data or with your configuration files.

Install the tf_classification repo if you haven't done so already. cd to the tf_classification repo and execute the following command:

$ CUDA_VISIBLE_DEVICES=1 python visualize_train_inputs.py \
--tfrecords $DATASET_DIR/train* \
--config $EXPERIMENT_DIR/config_train.yaml \
--text_labels

The CUDA_VISIBLE_DEVICES=1 environment variable specifies that this process should only use the GPU with id 0. This prevents TensorFlow from allocating memory on all of the GPUs in the machine.

Input Visualization

You should see a matplotlib window open that shows you the original image, and the image after all augmentations have been performed. At this point we want to do a sanity check on the augmentations. Typically we don't want the augmentations to be so aggressive that they change the label of the image. For example, when doing a random crop of the image with certain parameter settings, it is possible that the crop does not contain any part of the bird, it could just contain a patch of sky. You can adjust the parameters of the augmentations until you are satisfied with the resulting visualizations.

With that said, I have seen better performance on the CUB-200 dataset when I allow more aggressive augmentations. For example, setting the IMAGE_PROCESSING.RANDOM_CROP_CFG.MIN_AREA to 0.1 rather than 0.6 achieves higher test time accuracy. A possible explanation for this is that the small background crops act as another source of regularization, encouraging the model to ignore these regions and focus more on the bird. It could also be due to the small size of the CUB-200 dataset. However, this deserves a more thorough analysis.

Download an ImageNet pretrained model

The CUB-200 dataset is not large enough to train the Inception V3 architecture. The model would overfit to the training data and have poor generalization performance. Luckily, there are pretrained networks available. These networks have been trained on the ImageNet 2012 classification challenge with 1.2 million images, and their trained weights have been made publicly available. We can download the checkpoint file for the pretrained Inception V3 model:

$ wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
$ tar -xzf inception_v3_2016_08_28.tar.gz

I'll assume that the path to the checkpoint file is store in the IMAGENET_PRETRAINED_MODEL environment variable. For example:

$ export IMAGENET_PRETRAINED_MODEL=/media/drive3/tensorflow_models/inception_v3.ckpt

Warm up training

Using a pretrained ImageNet model on the CUB-200 dataset requires us to replace the final classification layer. This is required because that layer is specific to the dataset (and specifically the number of classes in that dataset) that was used to train the model. The rest of the layers can be treated as dataset agnostic. Therefore, to train the model to classify the 200 species of birds, we want to load in all the weights from the pretrained ImageNet model except for the weights corresponding to the final classification layer. Those weights will be initialized with random values.

A training protocol that typically leads to good performance is to first train only those new layers that get initialized with random values, leaving the rest of the weights fixed. Once those new layers have "warmed up" then we will train all of the weights together. However, this "warm up" step is not necessary and we could just train all of the weights together from the get go. From my experience, this "warm up" step does not always lead to better performance, but I have not seen it lead to worse performance when compared to a model that was trained without the "warm up" step.

For the "warm up" training step we can use the --checkpoint_exclude_scopes directive to load in all weights except the final fully connected layers and then use the --trainable_scopes directive to only train those new layers. The command looks like:

$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir/finetune \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $IMAGENET_PRETRAINED_MODEL \
--trainable_scopes InceptionV3/Logits InceptionV3/AuxLogits \
--checkpoint_exclude_scopes InceptionV3/Logits InceptionV3/AuxLogits \
--learning_rate_decay_type fixed \
--lr 0.01

For the warm up phase we won't worry about decaying the learning rate and instead opt to fix it at a relatively high value. How will we know when the new layers have warmed up? We'll use our validation data and monitor when performance starts to plateau. Once that happens we can move on to the full training. If you have two GPUs in your machine then this is very easy to do, we'll just start the test script on the other GPU using the CUDA_VISIBLE_DEVICES variable:

$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/val* \
--save_dir $EXPERIMENT_DIR/logdir/finetune/val_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir/finetune \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 20 \
--batches 30 \
--eval_interval_secs 180

This will evaluate the latest checkpoint of the model every 180 seconds. This matches the frequency of training script saving the model, which is set with SAVE_INTERVAL_SECS in the training configuration file. Note how (--batch_size * --batches) = 600. This means will evaluate all of the images in the validation set each time.

If you don't have a second GPU in your machine then you have a few options:

scp the checkpoint files to another machine and run the evaluation script on that machine. Then scp the summary file into the $EXPERIMENT_DIR/logdir/finetune/val_summaries directory.
Similar to the option above, you can set up a nfs and have another machine mount the file system so that it can access the checkpoint files and write the summary files into the $EXPERIMENT_DIR/logdir/finetune/val_summaries directory.
Alternate between training and evaluating on the same machine. This is probably easiest done with a bash script that iterates betweens calling the train.py script and the test.py script in a while loop. You can use the --max_number_of_steps command line option with the train.py script to force it to stop after X steps.

We can now turn on TensorBoard and watch the training progress:

$ tensorboard --logdir=$EXPERIMENT_DIR/logdir --port=6006

Here is a screen shot from TensorBoard of the warm up training session:

TensorBoard Warm up Training

Once our validation performance plateaus, we can move on to the full training phase. I stopped the training script after 6,174 iterations. My validation accuracy was at 63.67%. I could have been a little more patient and waited to see if the validation accuracy increased with more time. However I don't think its necessary for the new weights to be "perfect" at this point. We just want to get them pointed in the right direction.

Full training

To start the full training we'll simply load in the the warmed up model using the --pretrained_model flag. We want to load and train all weights, so we don't need to use the --checkpoint_exclude_scopes or the --trainable_scopes command line flags.

$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $EXPERIMENT_DIR/logdir/finetune

How will we know when the model is done training? We'll watch the performance on the validation set again. Once performance plateaus, we'll stop the training script. If you see the validation performance start to decrease, then you have let the training script run for too long.

$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/val* \
--save_dir $EXPERIMENT_DIR/logdir/val_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 20 \
--batches 30 \
--eval_interval_secs 180

Again, we can monitor the training via TensorBoard:

$ tensorboard --logdir=$EXPERIMENT_DIR/logdir --port=6006

Here is a screen shot from TensorBoard of the full training session: TODO: Insert tensorboard screen shots

Testing

After monitoring the validation performance and using it to stop the training performance we are ready to test the model. We can test by doing:

$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/test* \
--save_dir $EXPERIMENT_DIR/logdir/test_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 1 \
--batches 5794

We want to test the model against all of the images, so we set the batch size to 1 and the number of batches to 5794. The number 5794 does not have convenient factors, hence the batch size of 1. This command will print out the test accuracy to the command line and will save the summaries to the $EXPERIMENT_DIR/logdir/test_summaries directory.

With this we are done!

Knobs and Buttons: Playing with the Hyper-parameters

The config file currently has the BATCH_SIZE set to 32. This seems to be the standard batch size for these larger classification networks. If you have a GPU with a lot of memory in your machine, then you can experiment with increasing the BATCH_SIZE. You can use the --batch_size command line flag to set it rather than change the config file. There is a balance between the size of the batch and the time it takes to run one training step. A larger batch size will take more time to process, but you may get a better trajectory to a minima.

TODO: discuss learning rate, learning rate decay, batch norm decay, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly