-
The Chinese University of Hong Kong, Shenzhen
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
Web interface for browsing, search and filtering recent arxiv submissions
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
first base model for full-duplex conversational audio
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
An Open-Sourced LLM-empowered Foundation TTS System
A Pytorch Implementation of Finite Scalar Quantization
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
AudioBench: A Universal Benchmark for Audio Large Language Models
Neural Networks: Zero to Hero
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
text to speech using autoregressive transformer and VITS
Foundational model for human-like, expressive TTS
Awesome speech/audio LLMs, representation learning, and codec models