Replies: 1 comment 1 reply
-
With the disclaimer that I'm not too familiar with the speaker recognition codebase, this does look like an error I used to run into when recording my own data for testing when I didn't "flatten" the channels. The NeMo audio file loaders do normalize sample rate but I don't think they handle multiple channels properly--you'll have to average your recordings' left/right channels beforehand. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'd like to test the speaker recognition process using a few enrollment audio files and a variety of test files. I have a few .wav formatted recordings of myself speaking and a bunch of .ogg and .wav files of other speakers and myself mixed in as the test set. However, the script always blows up with tensor shape errors. Is there some assumption made about the input format? These recordings are at different sample rates and possibly with different channel sizes but I assumed they'd be normalized.
Example error:
Any help would be appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions