The repository contains different models implemented in Two-stream FCAN. If you feel this repository useful, please cite our paper:
@inproceedings{AnTran_ICCV_2017,
author = {An Tran and
Loong-Fah Cheong},
title = {Two-stream Flow-guided Convolutional Attention Networks for Action Recognition},
booktitle = {The IEEE International Conference on Computer Vision Workshop (ICCVW)},
year = {2017},
}
Plan to release the source codes:
- Write build_all.sh script
- Release scripts to create database file
- Release test prototxt file
- Release caffemodel file
- Release classification scripts
- Write README.md file.
The following is the guidance to reproduce the reported results and extend to more datasets.
Similar to TSN repository, the major libraries we use are
Our my-very-deep-caffe
is a modified fork of BLVC Caffe and our dense_flow
software is modifications from Wang Limin's dense flow.
We choose to keep our my-very-deep-caffe
to be aligned with the original of Caffe's fork commit 5a201dd960840c319cefd9fa9e2a40d2c76ddd73.
We would like to preserve the strength of BLVC Caffe software which is a deep learning framework made with expression, speed, and modularity
in mind. Our software also inherits training mechanism from multiple GPUs from BLVC Caffe.
Our models are released under folder models
. With extensibility and simplicity in mind, we organize steps of training, extracting features and evaluating models of a model in the same folder. If we would like to extend the training to other datasets, we only need to copy a sample folder into a
folder and make some necessary modifications.
Use git to clone this repository and its submodules
git clone --recursive https://github.com/antran89/two-stream-fcan.git
Then run the building scripts to build the libraries.
bash build_all.sh
It will build Caffe and dense_flow. Since we need OpenCV to have Video IO, which is absent in most default installations, it will also download and build a local installation of OpenCV and use its Python interfaces.
We experimented on three mainstream action recognition datasets: UCF-101, HMDB51 and Hollywood2. Videos can be downloaded directly from their websites.
After download, please extract the videos from the rar
archives.
- UCF101: the ucf101 videos are archived in the downloaded file. Please use
unrar x UCF101.rar
to extract the videos. - HMDB51: the HMDB51 video archive has two-level of packaging. The following commands illustrate how to extract the videos.
mkdir rars && mkdir videos
unrar x hmdb51-org.rar rars/
for a in $(ls rars); do unrar x "rars/${a}" videos/; done;
To run the training and testing, we need to extract frames of video, also the temporal-C3D networks need optical flow or compensated optical flow images for input.
For UCF101, the extraction can be achieved with the script lib/dense_flow/python/runme_dense_flow_ucf101.sh
. We can modify some key elements of scripts:
VIDEO_FOLDER
points to the folder where you put the video datasetIMG_FOLDER
andFLOW_FOLDER
point to the root folder where the extracted frames and optical images will be put inNUM_WORKERS
specifies the number of processes to use in parallel for flow extraction on 1 GPUCUDA_VISIBLE_DEVICES
specifies GPU id to run extraction, default gpu 0- Other variables are self-explainable.
The command for running frames and optical flow extraction is as follows
cd lib/dense_flow/python
bash runme_dense_flow_ucf101.sh
It will take from several hours to several days to extract optical flows for the whole datasets, depending on the number of GPUs.
In order to train/test video classification models, we need to have a text file lists of all video segments. An example of our video snippets database is in internal-data/video-snippets-database
. We need to generate the video snippets again for each dataset, because data is different for different users (e.g., number of video frames). The format of video snippets as following:
PlayingPiano/v_PlayingPiano_g20_c02 0017 63
BlowingCandles/v_BlowingCandles_g14_c03 0016 13
BreastStroke/v_BreastStroke_g21_c02 0064 18
GolfSwing/v_GolfSwing_g14_c03 0011 32
Archery/v_Archery_g16_c05 0046 2
BreastStroke/v_BreastStroke_g21_c01 0073 18
with three columns corresponding to video file, frame index, and class id (0-index).
To create database, modify FLOW_FOLDER
in run_create_database.sh
file in folder script_create_databases/ucf101_overlappingsnippet_database
, then run:
cd script_create_databases/ucf101_overlappingsnippet_database/
bash run_create_database.sh
We provided the trained model weights in Caffe style, consisting of specifications in Protobuf messages, and model weights. In the codebase we provide the model spec for UCF101 and HMDB51. The model weights can be downloaded by running the script
bash pre-trained-models/get_pretrained_models.sh
NOTE: Before running training or testing a model, we must first generate prefix index file (e.g., ucf101_train_flow_len16_split01.txt
), also symbolic links for flow and/or rgb dataset folders. This is only needed to do one time for each model (i.e., each folder contains models in folder models
).
cd models/fcan/ucf101_ltc_fcan_comp_snippet/scripts-create-database/
bash run_create_database.sh
To test a model (e.g., ucf101_ltc_fcan_comp_snippet
), let run:
cd models/fcan/ucf101_ltc_fcan_comp_snippet/extract_features/
bash run_1view_feature_extraction.sh
The results are saved in foler evaluate_models
.
To test combined multiple features of two-stream FCAN, we refer to the folder classify-video-multi-features
.
cd classify-video-multi-features/UCF101
bash run_classify_multi_features_ucf101_2features.sh
NOTE: Before running training or testing a model, we must first generate prefix index file (e.g., ucf101_train_flow_len16_split01.txt
), also symbolic links for flow and/or rgb dataset folders. This is only needed to do one time for each model (i.e., each folder contains models in folder models
).
To train a model (e.g., ucf101_ltc_fcan_comp_snippet
), after preparing all necessary files, let run (see an example in script run_job.sh
):
cd models/fcan/ucf101_ltc_fcan_comp_snippet
bash train_action_recognition_rgb_fcan.sh > c3d_rgb_fcan_pool1_sz112_len16_split2_bs64_fi2.log 2>&1
Please look at an example code in a folder and the above guidances to extend the training process to other splits or other datasets.
For any question, please contact
An Tran: an.tran@u.nus.edu