Skip to content
This repository has been archived by the owner on Mar 8, 2023. It is now read-only.

Commit

Permalink
Corpora collector (#129)
Browse files Browse the repository at this point in the history
  • Loading branch information
eziolotta authored Mar 16, 2021
1 parent 0b3f29c commit 7cdd498
Show file tree
Hide file tree
Showing 3 changed files with 674 additions and 89 deletions.
41 changes: 41 additions & 0 deletions MITADS-Speech/assets/corpora_collector/mitads-speech-part1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
##
name: 'mitads-speech-part1'
version: '0.1'
description: 'MITADS-Speech Dataset, filter audio more than 20 second'

## to be more usable we split final corpora into parts
split_final_dataset: 18
csv_rel_path_linux: True
corpus2collect:
##evalita2009:

## filter:
## max_duration: 20
##mspka:
## filter:
## max_duration: 20
##siwis:
#### filter:
## max_duration: 20

m-ailabs:
filter:
max_duration: 20

mls:
filter:
max_duration: 20
comments_contains:
## filter ancient work by author
- Dante Alighieri
- Giovanni Francesco Straparola
- Niccolò Machiavelli
##filter title book that is present in m-ailabs
- Novelle per un anno
- Galatea
- Il fu Mattia Pascal
- Ritratto del Diavolo
- Contessa di Karolystria
- Le meraviglie del Duemila
- Malavoglia

Loading

0 comments on commit 7cdd498

Please sign in to comment.