Skip to content

Nexdata-AI/101-Hours-Italian-Children-Spontaneous-Speech-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

101-Hours-Italian-Children-Spontaneous-Speech-Data

Description

The 101 Hours - Italian Child's Spontaneous Speech Data, manually screened and processed. Annotation contains transcription text, speaker identification, gender and other informantion. This dataset can be applied in speech recognition (acoustic model or language model training), caption generation, voice content moderation and other AI algorithm research.

For more details, please refer to the link: https://www.nexdata.ai/datasets/1300?source=Github

Specifications

Format

16k Hz, 16 bit, wav, mono channel;

Age

12 years old and younger children;

Content category

including self-media, conversation, live, lecture, variety show;

Language

Italian

Annotation

annotation for the transcription text, speaker identification, gender;

Accuracy

Word Accuracy Rate (WAR) at least 98%.

Licensing Information

Commercial License