documentation

[ ] have more detailed command description on ‘abkhazia <command> –help’. Assume the user doesn’t know abkhazia or kaldi.
[-] improve the online documentation
- [X] install and configure
- [ ] preparators for new corpora
- [ ] ASR: LM, AM, features, alignment, etc
- [ ] ABX: item files generation from alignment
[ ] add READMEs in intermediate directories of the project to explain the sources tree (or have a ‘project organization’ page)

corpus [1/4]

[X] split with (p_train + p_test) < 1 (by now only =1 supported)
[ ] des méthodes de stats sur les corpus (duration, per spks, etc)
[ ] need of a structure to append data to an existing corpus (eg attach a LM, features or manual alignments)
[ ] From the Thomas’s notes we want… Optionally, the corpus directory may contains:
- Time-alignments (The previous files in the list are sufficient for training ASR models with kaldi, however for ABX experiments we also need time-alignments for each phone or each word. If they are not provided directly, a surrogate will be obtained through forced-alignment using the kaldi-trained ASR models)
- A language model, either in OpenFST text or binary format or in ARPA-MIT N-gram format
- Syllabification of each utterance

preparators [2/5]

[X] GlobalPhone & CSJ : Laisser la possibilité de choisir de garder ou non le ‘+’ (pour l’instant on l’enlève)
[X] Chercher les transcriptions manquantes pour GlobalPhone (pour l’instant hard codé dans préparateur)
[ ] Faire un dossier GlobalPhone et un dossier CSJ pour les différents cas possible pour chaque corpus
[ ] Choisir CLAIREMENT la façon de traiter les parenthèses dans CSJ (pour l’instant :
- (W ..1;..2) : on garde ..1
- (?..) : on remplace le SUW par “?”
- (R xxx) : on remplace par “?”
- reste : on enlève caractère + parenthèse et on garde tout
[ ] On enlève les IPU lorsqu’il y a un caractère problèmatique (e.g. “(?” ou “(W” ou autre) dans un des SUW ?

acoustic [0/2]

[ ] –retrain option it should be possible to retrain a trained model on a new corpus (for instance, specifically retrain silence models, or retrain on a bunch of new corpus)
[ ] questions vs data-driven option (is it for tree building ?)

prepare_lang branch [1/2]

AM does not depend on LM [3/3]

[X] write a prepare_lang.sh standalone wrapper
[X] independant call to prepare_lang in AM classes
[X] update the command line API in consequence

fix decode and align [3/7]

[X] pass all the tests
[X] refactor core -> am now exports the lang folder
[X] optionally disable the scoring in decode forward –skip-scoring from kaldi to abkhazia
[ ] check for phonesets compatibility between AM/LM
[ ] add tests for cross use of AMs/LMs
[ ] add tests for wpd decoding
[ ] update the command line API in consequence

notes

required files by mkgraph.sh are

AM $lang/L.fst
LM $lang/G.fst
AM/LM $lang/phones.txt
AM? $lang/words.txt
AM/LM? $lang/phones/silence.csl
AM/LM? $lang/phones/disambig.int
AM $model/final.mdl
AM $model/tree

Minor changes [0/8]

upgrade the bootphon/kaldi repo

add cuda support in the install_kaldi.sh script

in export use symlinks to save some place

(file in output-dir, link in recipe-dir)

updating abkhazia.cfg

Need of an automated way to update new versions of the installed configuration file in the ./configure script.

adjust log level by detecting WARNING and ERROR from Kaldi messages

Those messages are actually logged in debug level, should be smater to be warnong/error 2016-10-12 16:33:58,372 - DEBUG - ERROR (apply-cmvn:Write():kaldi-matrix.cc:1229) Failed to write matrix to stream 2016-10-12 16:33:58,373 - DEBUG - WARNING (apply-cmvn:Write():util/kaldi-holder-inl.h:54) Exception caught writing Table object: ERROR (apply-cmvn:Write():kaldi-matrix.cc:1229) Failed to write matrix to stream

Have completion setup during installation (or configuration?)

Have a test_commands module for testing command line interface

New specifications (0.4)

corpus = BuckeyeCorpusPreparator('./buckeye').prepare()
corpus.speakers()
utt = corpus.utterances()

train, _ = corpus.split(train_prop=0.5, by_speakers=True)
train.save2h5('train.h5', group='corpus', wavs=True)
corpus = Corpus.read('train.h5', group='corpus')

lm = LanguageModelProcessor(order=3, level='word').compute(corpus)
lm.save('lm.fst')
lm.save2h5('train.h5', group='word-trigram')
assert lm.order == 3
assert lm.level == 'word'

features = FeaturesProcessor('mfcc', delta=2, pitch=True).compute(corpus)
f = features[utt[0]]  # np.array
features.write2h5('train.h5', 'features')
features.write2ark('/somewhere')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO.org

TODO.org

documentation

corpus [1/4]

preparators [2/5]

acoustic [0/2]

prepare_lang branch [1/2]

AM does not depend on LM [3/3]

fix decode and align [3/7]

notes

Minor changes [0/8]

upgrade the bootphon/kaldi repo

add cuda support in the install_kaldi.sh script

in export use symlinks to save some place

updating abkhazia.cfg

adjust log level by detecting WARNING and ERROR from Kaldi messages

Have completion setup during installation (or configuration?)

Have a test_commands module for testing command line interface

New specifications (0.4)

Files

TODO.org

Latest commit

History

TODO.org

File metadata and controls

documentation

corpus [1/4]

preparators [2/5]

acoustic [0/2]

prepare_lang branch [1/2]

AM does not depend on LM [3/3]

fix decode and align [3/7]

notes

Minor changes [0/8]

upgrade the bootphon/kaldi repo

add cuda support in the install_kaldi.sh script

in export use symlinks to save some place

updating abkhazia.cfg

adjust log level by detecting WARNING and ERROR from Kaldi messages

Have completion setup during installation (or configuration?)

Have a test_commands module for testing command line interface

New specifications (0.4)