[The testing result on a siren audio file seems not working from my end] #62

allenhung1025 · 2024-08-16T15:17:48Z

Hi @RetroCirce , I want to thank you for your great work!!!!
I am playing around the model checkpoint you provided (HTSAT_AudioSet_Saved_6.ckpt).
I can successfully load the model checkpoint and make the predictions. However, the probability summation is not 1 and after I do the softmax, the resulting prediction doesn't make sense to me, since the probability is super low. Do you have any insight or ideas about why this is the case?
The siren audio file I tried on the model and I believed it should be detected as either 323,/m/04qvtq,"Police car (siren)",
or 324,/m/012n7d,"Ambulance (siren)", but it is detected as music instead.
The colab demo
Appreciate the great work again, and hope to gain insight from you!!!

Before softmax

Running prediction...
pred probability sum: 2.45
[
  [
    137,
    "Music",
    0.6176819801330566
  ],
  [
    320,
    "Ice cream truck, ice cream van",
    0.30062487721443176
  ],
  [
    0,
    "Speech",
    0.13741374015808105
  ]
]

After softmax

pred probability sum: 1.00
[
  [
    137,
    "Music",
    0.0035008378326892853
  ],
  [
    320,
    "Ice cream truck, ice cream van",
    0.002549622440710664
  ],
  [
    0,
    "Speech",
    0.0021656793542206287
  ]
]

The text was updated successfully, but these errors were encountered:

allenhung1025 · 2024-08-16T15:24:47Z

After I use this audio file
The model can detect it is a siren sound, which is amazing!!!!

Running prediction...
pred probability sum: 1.00
[
  [
    396,
    "Siren",
    0.0029987646266818047
  ],
  [
    322,
    "Emergency vehicle",
    0.0028592264279723167
  ],
  [
    323,
    "Police car (siren)",
    0.0027548419311642647
  ]
]

But the probability is super low even for the largest label, is it a normal case?

RetroCirce · 2024-08-17T08:49:43Z

Hi Allen,

Glad that you like the model!
Thank you for your implementation and pull request. I will review it on weekends and post my comments if I have questions.

Regarding the siren sound, I think the first example does sound more like a “trunk” and “music” (flute) sound in the AudioSet. Note that the siren is usually a high-frequency interwoven sound related to ambulance and police car. I assume that the second example describes this sound better.

Regarding the low probability, I think you may not sum them to one because the training is using BCE with Sigmoid instead of CE with Softmax.

In that, each event is classified independently so the first big value (instead of the softmax) is the right probability i think.

best,
Ke

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[The testing result on a siren audio file seems not working from my end] #62

[The testing result on a siren audio file seems not working from my end] #62

allenhung1025 commented Aug 16, 2024

allenhung1025 commented Aug 16, 2024

RetroCirce commented Aug 17, 2024

[The testing result on a siren audio file seems not working from my end] #62

[The testing result on a siren audio file seems not working from my end] #62

Comments

allenhung1025 commented Aug 16, 2024

allenhung1025 commented Aug 16, 2024

RetroCirce commented Aug 17, 2024