The objective of the repository is working on a few shot, zero-shot, and meta learning problems and also to write readable, clean, and tested code. Below is the implementation of a few-shot algorithms for image classification.
- Generalizing from a Few Examples: A Survey on Few-Shot Learning (QUANMING Y et al. (2020))
- Prototypical Networks for Few-shot Learning (J. Snellet al. (2017))
- Matching Networks for One Shot Learning (Vinyals et al. (2017))
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (Finn et al. (2017))
- Learning to Compare: Relation Network for Few-Shot Learning (Sung F et al. (2018))
- Optimization as a Model For Few-Shot Learning (Ravi. S et al. (2017))
- How To Train Your MAML (Antreas A et al. (2017))
- Theory and Concepts
- Implementation in PyTorch
- Few Shot Learning in CVPR 2019
With the advancement of machine learning mainly in computational resources, and has been highly successful in data-intensive application but often slows down when the data is small. Recently, few-shot learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can generalize to new tasks containing few samples with supervision. Based on how prior knowledge can be used to handle this core issue, FSL methods categorize into three perspectives: (i) data, which uses prior knowledge to augment the supervised experience (ii) model, which uses prior knowledge to reduce the size of the hypothesis space and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space.
Consider a learning task T , FSL deals with a data set D = {Dtrain,Dtest} consisting of a training set Dtrain = {(xi,yi)} i = 1 to I where I is small, and a testing set Dtest = {xtest}. Let p(x,y) be the ground-truth joint probability distribution of input x and output y, and ˆh be the optimal hypothesis from x to y. FSL learns to discover ˆh by fitting Drain and testing on Dtest. To approximate ˆh, the FSL model determines a hypothesis space H of hypotheses h(θ) where θ denotes all the parameters used by h. Here, a parametric h is used, as a nonparametric model often requires large data sets, and thus not suitable for FSL. The below Figure, illustrates a different perspective of FSL method to solve the problems.
To achieve optimal few shot performance (Snell et.al) apply compelling inductive bias in class prototype form. The assumption made to consider an embedding in which samples from each class cluster around the prototypical representation which is nothing but the mean of each sample. However, In the n-shot classification problem, where n > 1, it performed by taking a class to the closest prototype. With this, the paper, has a strong theoretical proof on using euclidean distance over cosine distance which also represents the class mean of prototypical representations. Prototypical Networks also work for Zero-Shot Learning, which can learn from rich attributes or natural language descriptions. For eg. "color", "master category", "season", and "product display name", etc.
The objective of meta-learning algorithms is to optimize meta parameters. Precisely, we have algorithms that access to the training loss and some meta parameters and output some optimal or learned parameters. Likewise, Meta Agnostic Meta-Learning short for MAML is an optimization algorithm compatible with the model that learns through gradient descent. The meta parameters was a point of initialization for the SGD algorithms shared between all the independent task. Since the SGD update is differentiable, one can compute the gradients concerning meta parameters simply through backpropagation.
This codebase requires Python 3.5 (or higher). We recommend using Anaconda or Miniconda for setting up the virtual environment. Here's a walk through for the installation and setup.
Clone the Repository
git clone https://github.com/Shandilya21/few_shot_research.git
cd Few-Shot
conda create -n few_shot python=3.5
conda activate few_shot
Install all supporting libraries and packages in "requirements.txt".
pip install -r requirements.txt
Download the data, and place inside data folder. Extract the zip files to continue.
Edit DATA_PATH in config.py
and replace with appropriate dataset_path.
Kindly go through below instructions for fashionNet dataset preperation
python script/prepare_fashionNet.py
To know the dataset in details, kindly refer data/fashionNet/README.md
.
bash chmod +x experiments/run.sh
./run.sh
To reproduce the results on fashionNet DataSet, download the preprocessed data and Checkpoints.
(Download) place the files inside DATA_PATH/fashionNet/
.
Run `experiments/proto_nets.py` to reproduce results using Prototypical Networks
.
Arguments
dataset
: {'fashionNet'}.distance
: {'l2', 'cosine'}. Which distance metric to usen-train
: Support samples per class for training tasksn-test
: Support samples per class for validation tasksk-train
: Number of classes in training tasksk-test
: Number of classes in validation tasksq-train
: Query samples per class for training tasksq-test
: Query samples per class for validation tasks
In the main paper of Prototypical network, the author present strong arguments of euclidean distance over cosine distance which also represents the class mean of prototypical representations which we reciprocate in the experiments.
Small version | 1 | 2 | 3 |
---|---|---|---|
k - ways | 2 | 3 | 5 |
n - shots | 2 | 4 | 5 |
This Repo (l2) | 80.2 | 77.5 | 84.74 |
This Repo (Cos) | 72.5 | 73.88 | 77.68 |
Run `experiments/maml.py` to reproduce results using MAML Networks. (Refer the Theory section for details)
.
Arguments
dataset
: {'omniglot', 'miniImageNet'}. Whether to use the Omniglot or miniImagenet datasetdistance
: {'l2', 'cosine'}. Which distance metric to usen
: Support samples per class for few-shot tasksk
: Number of classes in training tasksq
: Query samples per class for training tasksinner-train-steps
: Number of inner-loop updates to perform on training tasksinner-val-steps
: Number of inner-loop updates to perform on validation tasksinner-lr
: Learning rate to use for inner-loop updatesmeta-lr
: Learning rate to use when updating the meta-learner weightsmeta-batch-size
: Number of tasks per meta-batchorder
: Whether to use 1st or 2nd order MAMLepochs
: Number of training epochsepoch-len
: Meta-batches per epocheval-batches
: Number of meta-batches to use when evaluating the model after each epoch
Small version | Order | 1 | 2 | 3 |
---|---|---|---|---|
k - ways | 2 | 5 | 5 | |
n - shots | 1 | 3 | 5 | |
This Repo | 1 | 92.67 | 90.65 | 93.23 |
- Multimodal Few Shot Classification.
- Zero Shot Image Classification.
Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first.
- (oscarknagg) for implementation (code in PyTorch).