Skip to content

Can NeMo do Forced Alignment with provided transcript and match audio file - & word level timing? #7041

Closed Answered by erastorgueva-nv
CodeFusionFX asked this question in Q&A
Discussion options

You must be logged in to vote

Hello, thank you for your question.

You can indeed do all of the above with NeMo, using a tool inside this repository called NeMo Forced Aligner (NFA). You can find information on how to use it here: https://github.com/NVIDIA/NeMo/tree/main/tools/nemo_forced_aligner. We are also planning to release a tutorial on how to use NFA soon.

By default, NFA produces token-level and word-level (i.e. substrings separated by spaces) timings. It is also possible to obtain sentence-level or phrase-level timings. This is because NFA can produce timings for user-defined groups of words (in NFA, we call these “segments”). To make sure NFA produces these timings, you need to mark the boundaries between seg…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by erastorgueva-nv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants