Skip to content

Scripts for preprocessing COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSEC)

License

Notifications You must be signed in to change notification settings

cogmhear/avsec_preprocessing

Repository files navigation

Scripts for preprocessing audio-visual speech enhancement challenge (AVSEC) data

This script can be used to extract the following features

  • FaceMesh landmarks [1]
  • lip images using landmark
  • face embeddings using FaceNet [2]
  • lip embeddings using TCN [3]

Requirements

## CPU 
pip install -r requirements.txt

## GPU
pip install -r requirements_gpu.txt

## Apple Silicon
pip install -r requirements_mac.txt

Usage

python main.py --data-dir ./data/train/scenes \
               --save-dir ./preprocessed/train \
               --models-root ./models \
               --all-feat

References

About

Scripts for preprocessing COG-MHEAR Audio-Visual Speech Enhancement Challenge (AVSEC)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages