Training & languages #4

Mlallena · 2021-07-08T08:59:40Z

If I wanted to add new languages to this program, or train the ones already present, how would I have to do it?

Also, you should update the link at the first instruction - I had to replace "latest" for "1.0.1" so I could download it.

igorsitdikov · 2021-07-08T20:05:13Z

Thank you. Unfortunately you have to train your own model for new language.
Or you can try https://huggingface.co/TalTechNLP/voxlingua107-epaca-tdnn

Mlallena · 2021-07-09T09:50:27Z

What did you use to train your own model? I'm asking because (unless I missed something) this repository doesn't have any code that is clearly used for training.

igorsitdikov · 2021-07-09T11:08:27Z

have a look #1

Mlallena · 2021-07-09T11:10:28Z

Thanks, I'll have a look.

Mlallena · 2021-07-12T12:19:51Z

OK, I have been checking, and it could work. Thing is, from what you said in #1, the only modification you make would be to the utt2spk file, but where would this file be stored? I'm going to go out on a limb and say that it is stored in a data folder within v2, but the main problem is that the run.sh file doesn't refer to that file. I'd also have to modify which corpus it is trying to target, since the audios are in a different folder.

Any help you can give me would be welcome.

asadullah797 · 2022-03-27T09:32:46Z

Hi Igor,
I am training Kaldi recipe on voxlingua data for language identification task but I could not find trials file.
Can you please share with me the trials file.
Many thanks.

igorsitdikov · 2022-03-28T05:55:11Z

Hello @asadullah797. You can generate file on your own.
It will look something like this:

lang-id-A utt-id-A target
lang-id-A utt-id-B nontarget
lang-id-A utt-id-C nontarget
lang-id-B utt-id-A nontarget
lang-id-B utt-id-B target

for 3 files and 3 languages:

en utt-en target
en utt-ru nontarget
en utt-pl nontarget
ru utt-en nontarget
ru utt-ru target
ru utt-pl nontarget
pl utt-en nontarget
pl utt-ru nontarget
pl utt-pl target

Sorry I don't remember, probably columns 1 and 2 should be swapped

asadullah797 · 2022-03-28T06:00:19Z

For lang id task; how can you define

lang-id-A utt-id-B nontarget

I mean how can you decide whether the given utterance is target/non-target.
Thanks

igorsitdikov · 2022-03-28T06:06:43Z

you have dataset with 3 languages, each wav file has only one language, you should have map wav file - language, so it will be target. all other 3 languages will be nontarget for the file.

asadullah797 · 2022-03-28T06:10:45Z

Just to confirm;
(wav1:>en, wav2:>es, wav3:>de)
en wav1 target
es wav1 nontarget
de wav1 nontarget
and so on for other cases as well.

igorsitdikov · 2022-03-28T06:17:38Z

I think so. But as I wrote before, if it will not work, try to swap columns 1 and 2
like this. Sorry really don't remember.
wav1 en target
wav1 es nontarget
wav1 de nontarget

asadullah73-ce · 2022-03-31T17:45:19Z

Hi Igor;
I have prepared trials file using (https://github.com/kaldi-asr/kaldi/blob/master/egs/aishell/v1/local/produce_trials.py) but at the end of the script I am getting this kind of error:
Key de__071xs-uBRZo__U__S10---0150.960-0167.120 not present in training iVectors
The key is the utterance_id in above.
Please note that I have created trials file from test data utt2spk.

Mlallena changed the title ~~Training & languges~~ Training & languages Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training & languages #4

Training & languages #4

Mlallena commented Jul 8, 2021 •

edited

Loading

igorsitdikov commented Jul 8, 2021 •

edited

Loading

Mlallena commented Jul 9, 2021

igorsitdikov commented Jul 9, 2021

Mlallena commented Jul 9, 2021

Mlallena commented Jul 12, 2021

asadullah797 commented Mar 27, 2022 •

edited

Loading

igorsitdikov commented Mar 28, 2022 •

edited

Loading

asadullah797 commented Mar 28, 2022

igorsitdikov commented Mar 28, 2022

asadullah797 commented Mar 28, 2022

igorsitdikov commented Mar 28, 2022

asadullah73-ce commented Mar 31, 2022

Training & languages #4

Training & languages #4

Comments

Mlallena commented Jul 8, 2021 • edited Loading

igorsitdikov commented Jul 8, 2021 • edited Loading

Mlallena commented Jul 9, 2021

igorsitdikov commented Jul 9, 2021

Mlallena commented Jul 9, 2021

Mlallena commented Jul 12, 2021

asadullah797 commented Mar 27, 2022 • edited Loading

igorsitdikov commented Mar 28, 2022 • edited Loading

asadullah797 commented Mar 28, 2022

igorsitdikov commented Mar 28, 2022

asadullah797 commented Mar 28, 2022

igorsitdikov commented Mar 28, 2022

asadullah73-ce commented Mar 31, 2022

Mlallena commented Jul 8, 2021 •

edited

Loading

igorsitdikov commented Jul 8, 2021 •

edited

Loading

asadullah797 commented Mar 27, 2022 •

edited

Loading

igorsitdikov commented Mar 28, 2022 •

edited

Loading