Skip to content

Latest commit

 

History

History
107 lines (99 loc) · 3.91 KB

ROADMAP.md

File metadata and controls

107 lines (99 loc) · 3.91 KB

Wespeaker Roadmap

Version 2.0 (Time: 2023.12)

This is the roadmap for wespeaker version 2.0.

  • SSL support
    • Algorithms
      • DINO
      • MOCO
      • SimCLR
      • Iteratively psudo label prediction and supervised finetuning
    • Recipes
      • VoxCeleb
      • WenetSpeech
      • Gigaspeech
  • Recipes
    • 3D-speaker
    • NIST SRE
      • SRE16
      • SRE18
    • Documents
      • Speaker embedding learning basics
      • Core code explanation
      • Step-by-step tutorials
        • VoxCeleb Supervised
        • VoxCeleb Self-supervised
        • VoxSRC Diarization

Version 1.0 (Time: 2022.09)

This is the roadmap for wespeaker version 1.0.

  • Standard dataset support
    • VoxCeleb
    • CnCeleb
  • SOTA models support
    • x-vector (tdnn based, milestone deep speaker embedding)
    • r-vector (resnet based, winner of voxsrc 2019)
    • ecapa-tdnn (variant of tdnn, winner of voxsrc 2020)
  • Back-end Support
    • Cosine
    • EER/minDCF
    • AS-norm
    • PLDA
  • UIO for effective industrial-scale dataset processing
    • Online data augmentation
      • Noise && RIR
      • Speed Perturb
      • Specaug
  • ONNX support
  • Triton Server support (GPU)
  • ~~
    • Training or finetuning big models such as WavLM might be too costly for current stage
  • Basic Speaker Diarization Recipe
    • Embedding based (more related with our speaker embedding learner toolkit)
  • Interactive Demo
    • Support using features from released pretrained models (hugging face)

Current Support List