GitHub

python3 multimodal_concat/train.py -n <wandb project name> --lr <learning rate> --batch_size <batch_size> --seed <random seed>

Notes:

Each modality features are represented by last hidden state of a modality-specific model
Text features are taken from [CLS] token
Video features are pooled from 16 frames into 1
Audio features are WIP
The model consists of two linear layers for now

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
audio_based		audio_based
multimodal_concat		multimodal_concat
text_based		text_based
video_based		video_based
README.md		README.md
requirements.txt		requirements.txt

Provide feedback