An Encoder-Decoder Model for Sequence-to-Sequence learning: Video to Text
MSVD Dataset (Download)
1450 videos for training, 100 videos for testing
The input features are extracted by VGG(pretrained on the ImageNet).
usage: video2text.py [-h] --uid UID [--train_path TRAIN_PATH]
[--test_path TEST_PATH] [--learning_rate LEARNING_RATE]
[--batch_size BATCH_SIZE] [--epoch EPOCH] [--test]
Video to Text Model
optional arguments:
-h, --help show this help message and exit
--uid UID training uid
--train_path TRAIN_PATH
training data path
--test_path TEST_PATH
test data path
--learning_rate LEARNING_RATE
learning rate for training
--batch_size BATCH_SIZE
batch size for training
--epoch EPOCH epochs for training
--test use this flag for testing
Split the pre-extracted features of videos into training and testing directories. For training you may want to preprocess the data.
For testing, you should use the --test
flag, and here is a sample script to generate the testing results!
python video2text.py --uid best --test
This generates the video-to-text output at test_ouput.txt, and the average bleu score is 0.69009423.
For more information, check out the report.
Keras Blog: A ten-minute introduction to sequence-to-sequence learning in Keras