This post describe the method and tips I got from participanting in the Dog Breed challenge[Dog Breed challenge] in Kaggle. I managed to get an final score of 0.13783, which sets me in the 158 postion, considering a lot of competitors are leveraging the 3-rd party dataset(which already contain the test data), I believe my approach worth sharing cause there is nothing else being used except the pretrained imagenet models.(However, there are a ensembles :O) [Dog Breed challenge]: http://www.kaggle.com/c/dog-breed-identification
import os
os.chdir('D:\Machine Learning\Kaggle\Dog Breed Identification\pytorch')
I delve into this problem firtsly using keras. However I cannot find powerful pretrained models like nasnet in Keras. Then I found this awesome package of [Pytorch pretrained models], all the models I have tryed are actually coming from this package. [Pytorch pretrained models]: http://github.com/Cadene/pretrained-models.pytorch
from PIL import Image
import torch
from torch.utils.data import Dataset,DataLoader,TensorDataset,ConcatDataset
from torchvision import transforms as trans
from torchvision import models,utils
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
from tqdm import tqdm
import pretrainedmodels
from torch import nn
from torch import optim
from torch.autograd import Variable
A dog image reading class is created using Pytorch Dataset. Please refer to the file for detail.
from dataset.dataset import Dogs
Setting all the hyperparameters,you can set batch_size larger if you have enough GPU memory
work_folder = Path('D:\Machine Learning\Kaggle\Dog Breed Identification')
train_image_folder = work_folder/'train'
test_image_folder = work_folder/'test'
bottlenecks_folder = work_folder/'pytorch'/'bottlenecks'
pred_folder = work_folder/'pred'
df_train = pd.read_csv(work_folder/'labels.csv',index_col=0)
df_test = pd.read_csv(work_folder/'sample_submission.csv',index_col=0)
img_size = 331
batch_size = 4
batch_size_top = 4096
use_cuda = torch.cuda.is_available()
date = '0222'
model_name = 'nasnet'
learning_rate = 0.0001
dropout_ratio = 0.5
input_shape = 331
crop_mode = 'center'
use_bias = True
name = '{}__model={}__lr={}__input_shape={}__drop={}__crop_mode={}__bias={}'.format(date,model_name,learning_rate,input_shape,dropout_ratio,crop_mode,use_bias)
I found out there 2 ways to preprocess the diffrent size image into same shape, resize and center cropping. The diffrence is subtle between them, hence it become part of hyperparameters, following transforms is tested for image preprocessing, however, after several checks I can say center cropping can gain a better result than resize.It looks like at least in this dataset it's better to keep the original image height and width ratio than keep the image margin.
if crop_mode == 'center':
transforms = trans.Compose([
trans.Resize(input_shape),
trans.CenterCrop(input_shape),
trans.ToTensor(),
trans.Normalize([0.5, 0.5, 0.5],[0.5, 0.5, 0.5])])
elif crop_mode == 'resize':
transforms = trans.Compose([
trans.Resize((input_shape,input_shape)),
trans.ToTensor(),
trans.Normalize([0.5, 0.5, 0.5],[0.5, 0.5, 0.5])])
Create the corresponding datasets and dataloader
train_dataset = Dogs(train_image_folder,df_train,df_test,is_train=True,resize=False,transforms=transforms)
test_dataset = Dogs(test_image_folder,df_train,df_test,False,resize=False,transforms=transforms)
train_dataset_resize = Dogs(train_image_folder,df_train,df_test,is_train=True,resize=True,transforms=transforms)
train_loader = DataLoader(train_dataset,batch_size,num_workers=0,shuffle=False)
test_loader = DataLoader(test_dataset,batch_size,num_workers=0,shuffle=False)
We can see the diffrence between center crop and resize here
img_center_crop = train_dataset.__getitem__(0)[0]*0.5 + 0.5
transforms_resize = trans.Compose([
trans.Resize((input_shape,input_shape)),
trans.ToTensor(),
trans.Normalize([0.5, 0.5, 0.5],[0.5, 0.5, 0.5])])
train_dataset_resize = Dogs(train_image_folder,df_train,df_test,is_train=True,resize=False,transforms=transforms_resize)
img_resize = train_dataset_resize.__getitem__(0)[0]*0.5 + 0.5
trans.ToPILImage()(img_center_crop)
trans.ToPILImage()(img_resize)
The key to transfer learning is to get the bottleneck outputs.Normally we are using the second last layer before the final softmax classifier.
For nasnet in the predefined model,we can simply realize this by changing the last 2 layers of the original model into an identity mapping.
def get_extraction_model():
nasnet = pretrainedmodels.nasnetalarge(num_classes=1000)
nasnet = nasnet.eval()
nasnet.avg_pool = nn.AdaptiveAvgPool2d(1)
del nasnet.dropout
del nasnet.last_linear
nasnet.dropout = lambda x:x
nasnet.last_linear = lambda x:x
return nasnet
extraction_nasnet = get_extraction_model()
if use_cuda:
extraction_nasnet.cuda()
function to get the bottleneck output, notice that we keep the dataloader not shuffled so the output is sequential
def get_bottlenecks(data_loader,extration_model,test_mode=False):
x_pieces = []
y_pieces = []
for x,y in tqdm(iter(data_loader)):
if use_cuda:
x = Variable(x)
y = Variable(y) if not test_mode else y
x = x.cuda()
y = y.cuda() if not test_mode else y
x_pieces.append(extration_model(x).cpu().data.numpy())
y_pieces.append(y.cpu().data.numpy()) if not test_mode else y_pieces
bottlenecks_x = np.concatenate(x_pieces)
bottlenecks_y = np.concatenate(y_pieces) if not test_mode else None
return bottlenecks_x,bottlenecks_y
bottlenecks_x,bottlenecks_y= get_bottlenecks(train_loader,extraction_nasnet)
# np.save(bottlenecks_folder/(name+'_x'),bottlenecks_x)
# np.save(bottlenecks_folder/(name+'_y'),bottlenecks_y)
# bottlenecks_x = np.load(bottlenecks_folder/(name + '_x.npy'))
# bottlenecks_y = np.load(bottlenecks_folder/(name + '_y.npy'))
delete the model to save GPU memory
del extraction_nasnet
Create the linear layer whose input is the bottleneck features, output is the 120 classes.
class TopModule(nn.Module):
def __init__(self,dropout_ratio):
super(TopModule, self).__init__()
self.aff = nn.Linear(4032, 120,bias=use_bias)
self.dropout_ratio = dropout_ratio
def forward(self,x):
x = nn.Dropout(p = dropout_ratio)(x)
x = self.aff(x)
return x
criterion = nn.CrossEntropyLoss()
criterion = criterion.cuda()
train and validation data split
permutation = np.random.permutation(bottlenecks_x.shape[0])
x_train = bottlenecks_x[permutation][:-int(bottlenecks_x.shape[0]//5)]
x_val = bottlenecks_x[permutation][-int(bottlenecks_x.shape[0]//5):]
y_train = bottlenecks_y[permutation][:-int(bottlenecks_y.shape[0]//5)]
y_val = bottlenecks_y[permutation][-int(bottlenecks_y.shape[0]//5):]
top_only_train_dataset = TensorDataset(torch.FloatTensor(x_train),torch.LongTensor(y_train))
top_only_val_dataset = TensorDataset(torch.FloatTensor(x_val),torch.LongTensor(y_val))
top_only_train_loader = DataLoader(top_only_train_dataset,batch_size=batch_size_top,shuffle=True)
top_only_val_loader = DataLoader(top_only_val_dataset,batch_size=batch_size_top,shuffle=True)
total_dataset = ConcatDataset([top_only_train_dataset,top_only_val_dataset])
total_loader = DataLoader(total_dataset,batch_size=batch_size_top,shuffle=True)
training function.
def fit(loader,optimizer,criterion,model=top_only_model,epochs=1500,evaluate=True):
val_loss_history = []
val_acc_history = []
for epoch in range(epochs):
running_loss = 0.0
for i, data in enumerate(loader, 0):
inputs, labels = data
inputs, labels = Variable(inputs), Variable(labels)
if use_cuda:
inputs = inputs.cuda()
labels = labels.cuda()
optimizer.zero_grad()
# forward + backward
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.data[0]
print('[%d, %5d] Train_loss: %.3f' \% (epoch+1, i+1, running_loss / len(loader)))
if evaluate:
model.eval()
outputs = model(Variable(torch.from_numpy(x_val),volatile=True).cuda() if use_cuda else Variable(torch.from_numpy(x_val),volatile=True))
labels = torch.from_numpy(y_val).cuda() if use_cuda else torch.from_numpy(y_val)
labels = Variable(labels,volatile=True)
loss = criterion(outputs,labels)
x_val_v = Variable(torch.FloatTensor(x_val),volatile=True).cuda() if use_cuda else Variable(torch.FloatTensor(x_val),volatile=True)
_,pred = torch.max(model(x_val_v),1)
val_acc = np.mean(pred.cpu().data.numpy() == labels.cpu().data.numpy())
val_loss_history.append(loss.cpu().data.numpy())
val_acc_history.append(val_acc)
print('[%d] Val_loss: %.3f'% (epoch+1, loss))
print('[%d] Val_acc: %.3f'% (epoch+1, val_acc))
model.train()
print('Finished Training')
return val_loss_history,val_acc_history
val_loss_history,val_acc_history = fit(top_only_train_loader,optimizer,criterion,top_only_model,epochs=1500,evaluate=True)
get the best_epochs, and use it to train all the data(yes,I 'd like any tiny bit of improvement :)
best_epochs = np.argmin(np.array(val_loss_history))
best_epochs
best_val_loss = min(val_loss_history)
best_val_loss
top_only_model = TopModule(dropout_ratio)
if use_cuda:
top_only_model = top_only_model.cuda()
optimizer = optim.Adam(top_only_model.parameters(),lr=learning_rate)
fit(total_loader,optimizer,criterion,top_only_model,epochs=best_epochs,evaluate=False)
extraction_nasnet = get_extraction_model()
if use_cuda:
extraction_nasnet.cuda()
get the test bottleneck features.
bottlenecks_test_x,test_y = get_bottlenecks(test_loader,extraction_nasnet,True)
np.save(bottlenecks_folder/(name+'_test_x'),bottlenecks_test_x)
del extraction_nasnet
remember to switch the top model to eval mode, cause it used dropout
top_only_model.eval()
generate the final prediction.
x_test = Variable(torch.FloatTensor(bottlenecks_test_x),volatile=True).cuda() if use_cuda else Variable(torch.FloatTensor(bottlenecks_test_x),volatile=True)
pred_np = (nn.Softmax(1)(top_only_model(x_test))).cpu().data.numpy()
df_pred = pd.DataFrame(pred_np,index=df_test.index,columns=df_test.columns)
df_pred.to_csv(pred_folder/(name+'.csv'))
- By simply using nasnet pretrained model, I got a score of 0.157.
- Final score of 0.137 is achieved through [psuedo labeling] and ensemble with results from other models.
- center cropping is better than resize
- I tried data augmentation, which is not helping, I think this is because there is already a lot of dog pictures of diffrent dog species in the imagenet, hence the model already learned enough feature format in the upper layer
- I tried nasnet,inceptionv4,inceptionresnetv2,dpn107,xception,resnet152,inceptionv3 and some other models, the comparison of their performance in this task is identical to their result in the imagenet. Hence, I guess better model in imagenet can get better transfer learning performance, at least in here, fair enough
- I also tried to bind the bottleneck features from diffrent models together and train a linear classifier on it,it works, and helped as an important ensemble portion.
- found out the classes have the highest error rate, do something about it
- more playing with the input resolution, I just used the original imagenet input size, I wonder using clearer picture whether would help
- another round of pseudo labeling
- K fold validation