UCF101 Few-shot Action Recognition(metric-based)

sample code for few-shot action recognition on UCF101

UCF101.py sampler supports autoaugment[1] when scarcity of frames in video(optional).

Requirements

torch>=1.6.0
torchvision>=0.7.0
tensorboard>=2.3.0

Usage

download and extract frame from UCF101 videos. UCF101 Frame Extractor

split dataset for few-shot learning(if you already has csv files then you can skip this step)

python splitter.py --frames-path /path/to/frames --labels-path /path/to/labels --save-path /path/to/save

train(resnet18)

python train.py --frames-path /path/to/frames --save-path /path/to/save --tensorboard-path /path/to/tensorboard --model resnet --uniform-frame-sample --learning-rate 5e-4 --frame-size 168 --way 5 --shot 1 --query 5

train(r2plus1d18)

python train.py --frames-path /path/to/frames --save-path /path/to/save --tensorboard-path /path/to/tensorboard --model r2plus1d --uniform-frame-sample --metric cosine --way 5 --shot 1 --query 5

test(resnet18)

python test.py --frames-path /path/to/frames --load-path /path/to/load --use-best --model resnet --frame-size 168 --way 5 --shot 1 --query 5

test(r2plus1d18)

python test.py --frames-path /path/to/frames --load-path /path/to/load --use-best --model r2plus1d --metric cosine --way 5 --shot 1 --query 5

Settings and Results

device information: GPU: RTX 2080 Ti(11GB)
data settings: train class: 71 (9473 videos), test(val) class: 30 (3847 videos)

option settings
frame size: 112(r2plus1d), 168(resnet)
num epochs: 30 train iter size: 100
val iter size: 200
metric: cosine
random pad sample: False
pad option: default
uniform frame sample: True
random start position: False
max interval: 7
random interval: False
sequence length: 35
num_layers:1 (resnet)
hidden_size: 512 (resnet)
learning rate: 1e-4(r2plus1d), 5e-4(resnet)
scheduler step: 10
scheduler gamma: 0.9
way: 5
shot: 1
query: 5
require video memory: resnet: about 7538 MB, r2plus1d: about 10042 MB
all accuracy results are averaged over 6000 test episodes with 95% confidence intervals

model	Accuracy
resnet18	70.08 ±0.32
r2plus1d18	94.29 ±0.67

`UCF101.py` Options

common options

model: choose for different normalization value of model
frames_path: frames path
labels_path: labels path
frame_size: frame size(width and height are should be same)
sequence_length: number of frames
setname: sampling mode, if this mode is 'train' then the sampler read a 'train.csv' file to load train dataset [default: 'train', others: 'test']

pad options

random_pad_sample: sampling frames from current frames with randomly for making some pads when frames are insufficient, if this value is False then only use first frame repeatedly [default: True, other: False]
pad_option: if this value is 'autoaugment' then pads will augmented by autoaugment policies [default: 'default', other: 'autoaugment']

frame sampler options

uniform_frame_sample: sampling frames with same interval, if this value is False then sampling frames with ignored interval [default: True, other: False]
random_start_position: decides the starting point with randomly by considering the interval, if this value is False then starting point is always 0 [default: True, other, False]
max_interval: setting of maximum frame interval, if this value is high then probability of missing sequence of video is high [default: 7]
random_interval: decides the interval value with randomly, if this value is False then use a maximum interval [default: True, other: False]

CategoriesSampler Options in `UCF101.py`

labels: this parameter only receive classes in csv files, so this value must be UCF101.classes
iter_size: number of iteration per episodes(total episode = epochs * iter_size)
way: number of way(number of class)
shot: number of shot
query: number of query
*way, shot, query => we follow episodic training stratiegy[2]

references

[1] Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le, "AutoAugment: Learning Augmentation Strategies From Data", Computer Vision and Pattern Recognition(CVPR), 2019, pp. 113-123
[2] Vinyals, Oriol and Blundell, Charles and Lillicrap, Timothy and kavukcuoglu, koray and Wierstra, Daan, "Matching Networks for One Shot Learning", Neural Information Processing Systems(NIPS), 2016, pp. 3630-3638

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
UCF101_few_shot_labels		UCF101_few_shot_labels
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
UCF101.py		UCF101.py
autoaugment.py		autoaugment.py
categories.txt		categories.txt
few_sequence_detector.py		few_sequence_detector.py
models.py		models.py
splitter.py		splitter.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCF101 Few-shot Action Recognition(metric-based)

Requirements

Usage

Settings and Results

`UCF101.py` Options

common options

pad options

frame sampler options

CategoriesSampler Options in `UCF101.py`

references

About

Releases

Packages

Languages

License

titania7777/UCF101FewShot

Folders and files

Latest commit

History

Repository files navigation

UCF101 Few-shot Action Recognition(metric-based)

Requirements

Usage

Settings and Results

UCF101.py Options

common options

pad options

frame sampler options

CategoriesSampler Options in UCF101.py

references

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`UCF101.py` Options

CategoriesSampler Options in `UCF101.py`

Packages