An encoder-decoder deep learning model (with/without attention mechanism) where the input is an arabic sign-language video and the output is its translation in text format.
Note: For a detailed model architecture and preprocesing, refer Video Captioning.ipynb file.
The dataset consists of
- Total number of video samples are 534.
- 10 different sentences performed by three signers.
- Each video sample is already normalized to 80 frames.
- Requirements
python >= 3.6
git clone https://github.com/AI-14/video-captioning-for-arabic-sign-language-recognition-at-sentence-level.git
- clones the repositorycd video-captioning-for-arabic-sign-language-recognition-at-sentence-level
py -m venv yourVenvName
- creates a virtual environmentpip install -r requirements.txt
- installs all modules