You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @shubhr and @c4dm, features extracted from 2022 data (saved in Mel_train.h5) using the exact code have all zero features for ~80% audios. I investigated the issue and found that there are 4 audios (and annotations) in WMW data folder that have the first POS label with Starttime 0. As the current code uses a margin of 25ms around the onset and offsets (time_2_frame(df,fps)' function in 'Feature_extract.py), Starttime for the first POS labels for these audios become negative. When the timestamps are converted to frames, they are also negative and pcen_patch@Feature_extract.py:line 62 becomes an empty array. This makes all the previous entries of the hf['features'][file_index]@Feature_extract.py:line 65 to have zeros, instead of the actual values. The 4 audio files are- XC406576.wav, XC417425.wav, XC440361.wav, XC483906.wav. This issue is not present with 2021 training data, as there are no audios (and annotations) with such case. I am not sure if the baseline training was done using these features with zero values, which might create embedding/prototypes not representative of the actual classes and data.
If I add the following checking mechanism in Feature_extract.py:line 55, the issue can be avoided by making all negative frame indices 0. It can be done for the evaluation features as well to avoid same issue.
if str_ind < 0:
str_ind = 0
if end_ind < 0:
end_ind = 0
The text was updated successfully, but these errors were encountered:
most of the features array is zero and the Mel_train.h5 created is just 65 mb which seems very small .please solve this issue to reproduce the baseline results
Hi @shubhr and @c4dm, features extracted from 2022 data (saved in
Mel_train.h5
) using the exact code have all zero features for ~80% audios. I investigated the issue and found that there are 4 audios (and annotations) inWMW
data folder that have the first POS label with Starttime 0. As the current code uses a margin of 25ms around the onset and offsets (time_2_frame(df,fps)' function in 'Feature_extract.py
), Starttime for the first POS labels for these audios become negative. When the timestamps are converted to frames, they are also negative andpcen_patch@Feature_extract.py:line 62
becomes an empty array. This makes all the previous entries of thehf['features'][file_index]@Feature_extract.py:line 65
to have zeros, instead of the actual values. The 4 audio files are-XC406576.wav, XC417425.wav, XC440361.wav, XC483906.wav
. This issue is not present with 2021 training data, as there are no audios (and annotations) with such case. I am not sure if the baseline training was done using these features with zero values, which might create embedding/prototypes not representative of the actual classes and data.If I add the following checking mechanism in
Feature_extract.py:line 55
, the issue can be avoided by making all negative frame indices 0. It can be done for the evaluation features as well to avoid same issue.The text was updated successfully, but these errors were encountered: