Zhengdi Yu1,2 · Shaoli Huang2 · Yongkang Cheng2 · Tolga Birdal1
1Imperial College London, 2Tencent AI Lab
ECCV 2024
SignAvatars is the first large-scale 3D sign language holistic motion dataset with mesh annotations, which comprises 8.34M precise 3D whole-body SMPL-X annotations, covering 70K motion sequences. The corresponding MANO hand version is also provided.
- [2024/12/4] [Paper] Visualization code is now provided. ⭐
- [2023/11/2] Paper is now available. ⭐
- Initial release of annotations.
- Release the visualization code.
- Enrich the dataset
SLP from HamNoSys | SLP from Word |
SLP from ASL | SLP from GSL |
For annotations, please fill out this form to request access to use SignAvatars for non-commercial research purposes. By submitting the form, you have read and agree to the terms of the Data license and you will receive an email and please download the motion and text labels from the provided downloading links.
We do not distribute the original RGB videos due to license. We provide high-quality 3D motion labels annotated by our team. For the original video download of the 4 subsets, please follow the instructions below:
- For ASL subset, please download
Green Screen RGB clips
from how2sign dataset and put intolanguage2motion/
. - For HamNoSys subset, please download the original videos using the data.json from the downloaded
HamNoSys/data.json
. - For GSL subset, please follow the official instruction to download and put into
language2motion/
. - For Word subset, please follow the official instruction to download and put into
word2motion/
.
After downloading the data, please construct the layout of dataset/
as follows:
|-- dataset
| |-- hamnosys2motion/
| | |-- images/
| | | |-- <video_name>/
| | | | |-- <frame_number.jpg> [ starts from 000000.jpg ]
| | |-- videos/
| | | |-- <video_name>/ [ ..... ]
| | |-- annotations/
| | | |-- <annotation_type> [ SMPL-X, MANO, ...]
| | | | |-- <video_name.pkl>
| | |-- data.json [Text annotations]
| | |-- split.pkl
| | |
| |-- language2motion/
| | |-- images/
| | | |-- <video_name>/
| | | | |-- <frame_number.jpg> [ starts from 000000.jpg ]
| | |-- videos/
| | | |-- <video_name>/ [ ..... ]
| | |-- annotations/
| | | |-- <annotation_type> [ SMPL-X, MANO, ...]
| | | | |-- <video_name.pkl>
| | |-- text/
| | | |-- how2sign_train.csv [Text annotations]
| | | |-- how2sign_test.csv [Text annotations]
| | | |-- how2sign_val.csv [Text annotations]
| | | |-- PHOENIX-2014-T.train.corpus.csv [Text annotations]
| | | |-- PHOENIX-2014-T.test.corpus.csv [Text annotations]
| | |
| |-- word2motion/
| | |-- images/
| | | |-- <video_name>/
| | | | |-- <frame_number.jpg> [ starts from 000000.jpg ]
| | |-- videos/
| | | |-- <video_name>/ [ ..... ]
| | |-- annotations/
| | | |-- <annotation_type> [ SMPL-X, MANO, ...]
| | | | |-- <video_name.pkl>
| | |-- text/
| | | |-- WLASL_v0.3.json [Text annotations]
| | |
|-- common
| |-- utils
| | |-- human_model_files
| | | |-- smpl
| | | | |-- SMPL_NEUTRAL.pkl
| | | | |-- SMPL_MALE.pkl
| | | | |-- SMPL_FEMALE.pkl
| | | |-- smplx
| | | | |-- MANO_SMPLX_vertex_ids.pkl
| | | | |-- SMPL-X__FLAME_vertex_ids.npy
| | | | |-- SMPLX_NEUTRAL.pkl
| | | | |-- SMPLX_to_J14.pkl
| | | | |-- SMPLX_NEUTRAL.npz
| | | | |-- SMPLX_MALE.npz
| | | | |-- SMPLX_FEMALE.npz
| | | |-- mano
| | | | |-- MANO_LEFT.pkl
| | | | |-- MANO_RIGHT.pkl
In common/
folder, human_model_files
contains smpl
, smplx
, mano
, and flame
3D model files. Download the files from [SMPL_NEUTRAL] [SMPL_MALE.pkl and SMPL_FEMALE.pkl] [smplx] [SMPLX_to_J14.pkl] [mano]. Alternatively, you can directly download our packed model files from Dropbox and unzip to human_model_files
.
In each of the .pkl files, the keys are in the format:
width, height: (1,) (1,) the video width and height
focal: (num_frames, 2)
princpt: (num_frames, 2)
2d: (num_frames, 106, 3)
pred2d: (num_frames, 106, 3)
total_valid_index: (num_frames,)
left_valid: (num_frames,)
right_valid: (num_frames,)
bb2img_trans: (num_frames, 2, 3)
smplx: (num_frames, 182)
unsmooth_smplx: (num_frames, 169)
For motion generation and motion prior learning tasks, you should use the data in smplx
for better stability, whilst unsmooth_smplx
can be used for pose estimation tasks. Please refer to code for more details. For example, you can extract smplx parameters as follow:
all_parameters = results_dict['smplx']
root_pose, body_pose, left_hand_pose, right_hand_pose, jaw_pose, shape, expression, cam_trans = \
all_parameters[:, :3], all_parameters[:, 3:66], all_parameters[:, 66:111], all_parameters[:, 111:156], \
all_parameters[:, 156:159], all_parameters[:, 159:169], all_parameters[:, 169:179], all_parameters[:, 179:182]
all_parameters = results_dict['unsmooth_smplx']
root_pose, body_pose, lhand_pose, rhand_pose, shape, cam_trans = \
all_parameters[:, :3], all_parameters[:, 3:66], all_parameters[:, 66:111], all_parameters[:, 111:156], \
all_parameters[:, 156:166], all_parameters[:, 166:169]
root_pose: (num_frames, 3)
body_pose: (num_frames, 63)
expression: (num_frames, 10)
jaw_pose: (num_frames, 3)
betas: (num_frames, 10)
left_hand_pose: (num_frames, 45)
right_hand_pose: (num_frames, 45)
Please note that the transl
is set to 0
in these subsets as there is no root position change in the video.
- The signers are standing and doing a single sign.
- Each video is annotated with hamnosys glyph and hamnosys text:
"hamsymmlr,hamflathand,hamextfingero,hampalml"
- The average length of the video is 60 frames with 24 fps
- The signers are sitting and doing multiple signs.
- Each video is annotated with natural language translations:
"So we're going to start again on this one."
- The average length of the video is 162 frames with 24 fps
- The signers are standing and doing a single sign.
- Each video is annotated with word-level English:
- The average length of the video is 57 frames with 24 fps
Using the virtual environment by running:
conda create -n signavatars python==3.8.8
conda activate signavatars
conda install -n signavatars pytorch==1.10.0 torchvision==0.11.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
Alternatively, youcan can visualize .pkl from our dataset.
python vis.py \
--pkl_file_path <path_to_pkl_folder/file> \
This will render the motion with its text annotation. Then, the results will be saved in ./render_results/
:
To visualize the motion overlay on the image, you need to first download the videos and run:
python vis.py \
--pkl_file_path <path_to_pkl_folder/file> \
--video_path <path_to_video_folder>
--overlay
Then, the results will be saved in ./render_results_overlay/
(default shape here):
@inproceedings{yu2024signavatars,
title={SignAvatars: A large-scale 3D sign language holistic motion dataset and benchmark},
author={Yu, Zhengdi and Huang, Shaoli and Cheng, Yongkang and Birdal, Tolga},
booktitle={European Conference on Computer Vision (ECCV)},
pages={1--19},
year={2024}
}
For technical questions, please contact ZhengdiYu@hotmail.com or z.yu23@imperial.ac.uk. For license, please contact shaolihuang@tencent.com.