Skip to content

TTS audio files for the CC-CEDICT Chinese-English dictionary

License

Notifications You must be signed in to change notification settings

cjhoward/cedict-tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CEDICT TTS

This repository contains Baidu Speech-generated TTS MP3s for (nearly) all entries in the CC-CEDICT Chinese-English dictionary, along with the Python script and CC-CEDICT dictionary file used to generate them. The female and male directories contain the generated MP3s in female (0) and male (1) voices, respectively. All MP3s were generated using the default speed (5), pitch (5), and volume (5). Higher quality audio can be obtained by changing the encoding to WAV and regenerating the audio.

Audio Usage

All audio files are in lowercase, and are named according to their pinyin pronunciations. The pinyin is numeric (i.e. pin1yin1 not pīnyīn), 'v' is used in place of 'ü', and 'er5' is used rather than 'r5'. Following the above rules, the corresponding MP3 for '律师' would be lv4shi1.mp3.

To get the correct pronunciation for words with '一' and '不', ensure that you use the tone-corrected pinyin when obligatory tone change rules apply. For example, the pinyin for '一切' should be 'yi2qie4' rather than 'yi1qie4'.

Anki

This data is particularly useful when creating Anki flashcards. First, copy all audio files from either the female or male directory to your Anki media folder. Then, provided that your card has a pinyin field, insert the line [sound:{{pinyin}}.mp3] to your card template to automatically add pronunciation audio to all cards.

Script Usage

In order to generate TTS audio using Baidu's speech synthesis service, you must have installed the Baidu RESTful API Python SDK, and have been issued an API key for the Baidu Speech API. Instructions for this process (in Chinese) can be found here.

After you have been issued an API key, copy down your app ID, API key, and secret key, as these are used as parameters to the TTS script. Finally, the script can be used as follows:

python3 tts.py <app ID> <API key> <secret key>

This will generate MP3s for all entries in the cedict_ts.u8 file. Feel free to edit the script manually to change the speed, pitch, volume, person, and encoding parameters as desired. See the Baidu TTS REST API (Chinese) for more information about the parameters.

Licenses

Name Author(s) License
tts.py Christopher J. Howard zlib License
MP3 files Christopher J. Howard CC0 1.0 Universal
CC-CEDICT MDBG CC BY-SA 3.0
Baidu RESTful API Python SDK Baidu Apache License, Version 2.0

About

TTS audio files for the CC-CEDICT Chinese-English dictionary

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published