-
Notifications
You must be signed in to change notification settings - Fork 35
CUB 200 Image Classification
In this tutorial we are going to train a classification model using the CUB-200-2011 dataset. This dataset contains 200 species of birds, each with roughly 30 training images and 30 testing images, and has become a staple for testing new ideas for fine-grained visual classification.
A few research papers claim that they can get over 80% accuracy on this dataset using only the images, no bounding boxes or parts are needed. Jaderberg et al. claim that they can acheive 82.3% with the Inception-V2 architecture and Krause et al. claim that they can get 84.4% accuracy with the Inception-V3 architecture. In this tutorial we'll use the Inception-V3 architecture and see if we can reproduce the 84.4% accuracy.
You can find the dataset website here. The dataset files are relatively small (about 1.3 GB when untared) and should easily fit on your machine.
$ wget http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
$ tar -xzf CUB_200_2011.tgz
We will use the tfrecords repo to create the tfrecord files that we can use to train and test the model. You'll need to clone that repo:
$ cd ~/code
$ git clone https://github.com/visipedia/tfrecords.git
Before we can call the create()
method in create_tfrecords.py we will need to format the CUB data. You can find a script for doing this formatting here. Fire up an ipython terminal and %cpaste
that script into the terminal. Now we can format the CUB dataset:
# make sure you have %cpasted the script
# Change these paths to match the location of the CUB dataset on your machine
cub_dataset_dir = "/media/drive2/datasets/CUB_200_2011"
cub_image_dir = "/media/drive2/datasets/CUB_200_2011/images"
# we need to create a file containing the size of each image in the dataset.
# you only need to do this once. scipy is required for this method.
# Alternatively, you can create this file yourself.
# Each line of the file should have <image_id> <width> <height>
create_image_sizes_file(cub_dataset_dir, cub_image_dir)
# Now we can create the datasets
train, test = format_dataset(cub_dataset_dir, cub_image_dir)
train, val = create_validation_split(train, fraction_per_class=0.1, shuffle=True)
We have created three arrays holding train, validation and test data. The CUB-200 dataset does not come with a standard validation set, so we took 10% of the train data and created a validation set. The number of elements in each array should be:
- Number of train images: 5394
- Number of validation images: 600
- Number of test images: 5794
We can now pass these arrays to the create()
method:
from create_tfrecords import create
# Change this path
dataset_dir = "/media/drive2/tensorflow_datasets/cub/with_600_val_split/"
train_errors = create(dataset=train, dataset_name="train", output_directory=dataset_dir,
num_shards=10, num_threads=2, shuffle=True)
val_errors = create(dataset=val, dataset_name="val", output_directory=dataset_dir,
num_shards=4, num_threads=2, shuffle=True)
test_errors = create(dataset=test, dataset_name="test", output_directory=dataset_dir,
num_shards=10, num_threads=2, shuffle=True)
We now have a dataset directory containing tfrecord files prefixed with either train
, val
or test
that we can use to train and test a model.
I'll assume that the path to the dataset directory is stored in the DATASET_DIR
environment variable for the rest of the tutorial. For example:
$ export DATASET_DIR=/media/drive2/tensorflow_datasets/cub/with_600_val_split
We'll store all the experiment files in a directory called cub_image_experiment/
. Create the following directory structure:
- cub_image_experiment/
- logdir/
- val_summaries/
- test_summaries/
- finetune/
- val_summaries/
- logdir/
I'll assume that the path to cub_image_experiment
is stored in the EXPERIMENT_DIR
environment variable for the rest of the tutorial. For example:
$ export EXPERIMENT_DIR=/media/drive2/tensorflow_experiments/ebird/cub_image_experiment
We'll need two configuration files:
- config_train.yaml: This will contain all the configurations for image augmentation, the optimizer, the learning rate, model regularization, snapshotting the model and a few other things.
- config_test.yaml: This will contain only the necessary configurations to test a model. It is essentially a subset of the training configurations with the image augmentations turned off.
The train configuration file can be found here, and the test configuration file can be found here. You should copy these two configuration files and put them into the cub_image_experiment
directory. I have renamed them config_train.yaml and config_test.yaml respectively. A few important parameters are the following:
Config Name | Value | Description |
---|---|---|
NUM_CLASSES | 200 | The number of classes (i.e. bird species) in the CUB dataset. |
NUM_TRAIN_EXAMPLES | 5394 | The number of training images in our training tfrecords. We need to know this number so that we can calculate the size of an epoch, which is the number of iterations (i.e. batches) it takes to go through all of the training data. |
NUM_TRAIN_ITERATIONS | 24000 | The maximum number of iterations to execute before stopping. This more of a convenience parameter than anything and allows us to stop the training at specific point. |
BATCH_SIZE | 32 | The number of images to process in one iteration. If you find that your machine is running out of memory then you can make this value smaller (e.g. 16 or 8, etc.). |
MODEL_NAME | inception_v3 | We are going to use the Inception V3 architecture as our model. |
I also like to create a file called cmds.txt
where I compose the commands for running the different scripts. We'll add commands to this file as we go. Here is an example cmds.txt file.
Your experiment directory should now look like:
- cub_image_experiment/
- logdir/
- val_summaries/
- test_summaries/
- finetune/
- val_summaries/
- cmds.txt
- config_train.yaml
- config_test.yaml
- logdir/
It is always a good idea to visualize the inputs to the network before starting the training process. This helps identify problems with the data or with your configuration files.
Install the tf_classification repo if you haven't done so already. cd
to the tf_classification repo and execute the following command:
$ CUDA_VISIBLE_DEVICES=1 python visualize_train_inputs.py \
--tfrecords $DATASET_DIR/train* \
--config $EXPERIMENT_DIR/config_train.yaml \
--text_labels
The CUDA_VISIBLE_DEVICES=1
environment variable specifies that this process should only use the GPU with id 0. This prevents TensorFlow from allocating memory on all of the GPUs in the machine.
You should see a matplotlib window open that shows you the original image, and the image after all augmentations have been performed. At this point we want to do a sanity check on the augmentations. Typically we don't want the augmentations to be so aggressive that they change the label of the image. For example, when doing a random crop of the image with certain parameter settings, it is possible that the crop does not contain any part of the bird, it could just contain a patch of sky. You can adjust the parameters of the augmentations until you are satisfied with the resulting visualizations.
With that said, I have seen better performance on the CUB-200 dataset when I allow more aggressive augmentations. For example, setting the IMAGE_PROCESSING.RANDOM_CROP_CFG.MIN_AREA
to 0.1 rather than 0.6 achieves higher test time accuracy. A possible explanation for this is that the small background crops act as another source of regularization, encouraging the model to ignore these regions and focus more on the bird. It could also be due to the small size of the CUB-200 dataset. However, this deserves a more thorough analysis.
The CUB-200 dataset is not large enough to train the Inception V3 architecture. The model would overfit to the training data and have poor generalization performance. Luckily, there are pretrained networks available. These networks have been trained on the ImageNet 2012 classification challenge with 1.2 million images, and their trained weights have been made publicly available. We can download the checkpoint file for the pretrained Inception V3 model:
$ wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
$ tar -xzf inception_v3_2016_08_28.tar.gz
I'll assume that the path to the checkpoint file is store in the IMAGENET_PRETRAINED_MODEL
environment variable. For example:
$ export IMAGENET_PRETRAINED_MODEL=/media/drive3/tensorflow_models/inception_v3.ckpt
Using a pretrained ImageNet model on the CUB-200 dataset requires us to replace the final classification layer. This is required because that layer is specific to the dataset (and specifically the number of classes in that dataset) that was used to train the model. The rest of the layers can be treated as dataset agnostic. Therefore, to train the model to classify the 200 species of birds, we want to load in all the weights from the pretrained ImageNet model except for the weights corresponding to the final classification layer. Those weights will be initialized with random values.
A training protocol that typically leads to good performance is to first train only those new layers that get initialized with random values, leaving the rest of the weights fixed. Once those new layers have "warmed up" then we will train all of the weights together. However, this "warm up" step is not necessary and we could just train all of the weights together from the get go. From my experience, this "warm up" step does not always lead to better performance, but I have not seen it lead to worse performance when compared to a model that was trained without the "warm up" step.
For the "warm up" training step we can use the --checkpoint_exclude_scopes
directive to load in all weights except the final fully connected layers and then use the --trainable_scopes
directive to only train those new layers. The command looks like:
$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir/finetune \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $IMAGENET_PRETRAINED_MODEL \
--trainable_scopes InceptionV3/Logits InceptionV3/AuxLogits \
--checkpoint_exclude_scopes InceptionV3/Logits InceptionV3/AuxLogits \
--learning_rate_decay_type fixed \
--lr 0.01
For the warm up phase we won't worry about decaying the learning rate and instead opt to fix it at a relatively high value. How will we know when the new layers have warmed up? We'll use our validation data and monitor when performance starts to plateau. Once that happens we can move on to the full training. If you have two GPUs in your machine then this is very easy to do, we'll just start the test script on the other GPU using the CUDA_VISIBLE_DEVICES
variable:
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/val* \
--save_dir $EXPERIMENT_DIR/logdir/finetune/val_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir/finetune \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 20 \
--batches 30 \
--eval_interval_secs 180
This will evaluate the latest checkpoint of the model every 180 seconds. This matches the frequency of training script saving the model, which is set with SAVE_INTERVAL_SECS
in the training configuration file. Note how (--batch_size * --batches) = 600
. This means will evaluate all of the images in the validation set each time.
If you don't have a second GPU in your machine then you have a few options:
- scp the checkpoint files to another machine and run the evaluation script on that machine. Then scp the summary file into the
$EXPERIMENT_DIR/logdir/finetune/val_summaries
directory. - Similar to the option above, you can set up a nfs and have another machine mount the file system so that it can access the checkpoint files and write the summary files into the
$EXPERIMENT_DIR/logdir/finetune/val_summaries
directory. - Alternate between training and evaluating on the same machine. This is probably easiest done with a bash script that iterates betweens calling the
train.py
script and thetest.py
script in a while loop. You can use the--max_number_of_steps
command line option with thetrain.py
script to force it to stop after X steps.
We can now turn on TensorBoard and watch the training progress:
$ tensorboard --logdir=$EXPERIMENT_DIR/logdir --port=6006
Here is a screen shot from TensorBoard of the warm up training session:
Once our validation performance plateaus, we can move on to the full training phase. I stopped the training script after 6,174 iterations. My validation accuracy was at 63.67%. I could have been a little more patient and waited to see if the validation accuracy increased with more time. However I don't think its necessary for the new weights to be "perfect" at this point. We just want to get them pointed in the right direction.
To start the full training we'll simply load in the the warmed up model using the --pretrained_model
flag. We want to load and train all weights, so we don't need to use the --checkpoint_exclude_scopes
or the --trainable_scopes
command line flags.
$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $EXPERIMENT_DIR/logdir/finetune
How will we know when the model is done training? We'll watch the performance on the validation set again. Once performance plateaus, we'll stop the training script. If you see the validation performance start to decrease, then you have let the training script run for too long.
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/val* \
--save_dir $EXPERIMENT_DIR/logdir/val_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 20 \
--batches 30 \
--eval_interval_secs 180
Again, we can monitor the training via TensorBoard:
$ tensorboard --logdir=$EXPERIMENT_DIR/logdir --port=6006
Here is a screen shot from TensorBoard of the full training session: TODO: Insert tensorboard screen shots
After monitoring the validation performance and using it to stop the training performance we are ready to test the model. We can test by doing:
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/test* \
--save_dir $EXPERIMENT_DIR/logdir/test_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 1 \
--batches 5794
We want to test the model against all of the images, so we set the batch size to 1 and the number of batches to 5794. The number 5794 does not have convenient factors, hence the batch size of 1. This command will print out the test accuracy to the command line and will save the summaries to the $EXPERIMENT_DIR/logdir/test_summaries
directory.
With this we are done!
The config file currently has the BATCH_SIZE
set to 32. This seems to be the standard batch size for these larger classification networks. If you have a GPU with a lot of memory in your machine, then you can experiment with increasing the BATCH_SIZE
. You can use the --batch_size
command line flag to set it rather than change the config file. There is a balance between the size of the batch and the time it takes to run one training step. A larger batch size will take more time to process, but you may get a better trajectory to a minima.
TODO: discuss learning rate, learning rate decay, batch norm decay, etc.