Replies: 1 comment 1 reply
-
I'm unaffiliated with nvidia and that model you mention, but I'm familiar with this data so I'll give you my two cents. The reason you cannot find ULCA for download is because it contains explicitly-copyrighted materials. The authors presumably do not have permission to distribute the data, and they ghost you if you ask about it (Open-Speech-EkStep/ULCA-asr-dataset-corpus#4). If you are not already aware of it, you may be interested in the Shrutilipi dataset. Though Shrutilipi is also collected from broadcasts, it differs from ULCA in these ways:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I working on Hindi ASR system. For that I collecting transcribed Hindi speech data. In this process I found the ULCA ASR Dataset Corpus in GitHub. But the links seems to be broken. I found the NVIDIA RIVA Hindi ASR models are trained on ULCA Hindi ASR Dataset Corpus. Is there any other sources to get this dataset?
Can anyone please help 🙏.
Beta Was this translation helpful? Give feedback.
All reactions