Skip to content

Clarification on Frame classification models #6975

Answered by stevehuang52
pehonnet asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @pehonnet , thanks for your questions~

The input to the model shouldn't be shorter than n_fft/2, which is 256 in our default config. The reason lies in how STFT is calculated. Before actually doing STFT, the input is zero-padded with size n_fft/2, where n_fft is the final size of the actual window that STFT will be performed on. If the input is shorter than n_fft/2, the right padding will be included in STFT calculation for the left several audio sample points, which is not desired. Since 160 samples (in your case) is shorter than 256, the error message is raised. Please refer to STFT and librosa for details.

Will I get the same results if I provide multiple audio frames at once versu…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@pehonnet
Comment options

Answer selected by pehonnet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants