CEDICT TTS

This repository contains Baidu Speech-generated TTS MP3s for (nearly) all entries in the CC-CEDICT Chinese-English dictionary, along with the Python script and CC-CEDICT dictionary file used to generate them. The female and male directories contain the generated MP3s in female (0) and male (1) voices, respectively. All MP3s were generated using the default speed (5), pitch (5), and volume (5). Higher quality audio can be obtained by changing the encoding to WAV and regenerating the audio.

Audio Usage

All audio files are in lowercase, and are named according to their pinyin pronunciations. The pinyin is numeric (i.e. pin1yin1 not pīnyīn), 'v' is used in place of 'ü', and 'er5' is used rather than 'r5'. Following the above rules, the corresponding MP3 for '律师' would be lv4shi1.mp3.

To get the correct pronunciation for words with '一' and '不', ensure that you use the tone-corrected pinyin when obligatory tone change rules apply. For example, the pinyin for '一切' should be 'yi2qie4' rather than 'yi1qie4'.

Anki

This data is particularly useful when creating Anki flashcards. First, copy all audio files from either the female or male directory to your Anki media folder. Then, provided that your card has a pinyin field, insert the line [sound:{{pinyin}}.mp3] to your card template to automatically add pronunciation audio to all cards.

Script Usage

In order to generate TTS audio using Baidu's speech synthesis service, you must have installed the Baidu RESTful API Python SDK, and have been issued an API key for the Baidu Speech API. Instructions for this process (in Chinese) can be found here.

After you have been issued an API key, copy down your app ID, API key, and secret key, as these are used as parameters to the TTS script. Finally, the script can be used as follows:

python3 tts.py <app ID> <API key> <secret key>

This will generate MP3s for all entries in the cedict_ts.u8 file. Feel free to edit the script manually to change the speed, pitch, volume, person, and encoding parameters as desired. See the Baidu TTS REST API (Chinese) for more information about the parameters.

Licenses

Name	Author(s)	License
tts.py	Christopher J. Howard	zlib License
MP3 files	Christopher J. Howard	CC0 1.0 Universal
CC-CEDICT	MDBG	CC BY-SA 3.0
Baidu RESTful API Python SDK	Baidu	Apache License, Version 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CEDICT TTS

Audio Usage

Anki

Script Usage

Licenses

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
female		female
male		male
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
cedict_ts.u8		cedict_ts.u8
tts.py		tts.py

License

cjhoward/cedict-tts

Folders and files

Latest commit

History

Repository files navigation

CEDICT TTS

Audio Usage

Anki

Script Usage

Licenses

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages