Can this be used to mute non speech parts of an audio? #27

orionflame · 2024-04-05T17:00:58Z

Hi,

I have a lot of narration done by myself for a tutorial that I made so I am trying to clean up the audio files to remove anything non speech related which is majority throat clearing, etc. Here is a very short sample:

https://www.dropbox.com/scl/fi/kotmse874x4rsi86kr8f8/voice3.mp3?rlkey=l5m56g5axort1ru70goo3rvch&dl=1

I couldn't install this library locally yet due to some dependency errors so I used the huggingface version (time res = 1.6) and got this:

0.0s-6.9s: pretty much everything you could want that occur around the normal vector not
6.9s-13.3s: along it. Keenan Crane is one of the leading
13.3s-17.2s: researchers in computational geometry.

So the first thing that popped up is I said Keenan 3 times which were retakes so they normally shouldn't exist except the last one. You can see this in the audio. Is this library also doing de-duplication of words?

For tags I got these:
0.0s-1.6s: Speech, Narration, monologue, Speech synthesizer, Clicking, Male speech, man speaking
1.6s-3.2s: Speech, Narration, monologue, Speech synthesizer, Clicking, Male speech, man speaking
3.2s-4.8s: Speech, Inside, small room, Clicking, Speech synthesizer, Narration, monologue
4.8s-6.4s: Speech, Narration, monologue, Speech synthesizer, Male speech, man speaking
6.4s-8.0s: Speech, Narration, monologue, Clicking, Speech synthesizer, Inside, small room
8.0s-9.6s: Speech, Clicking, Inside, small room
9.6s-11.2s: Speech, Clicking, Inside, small room, Narration, monologue, Male speech, man speaking
11.2s-12.8s: Speech, Speech synthesizer
12.8s-14.4s: Sine wave
14.4s-16.0s: Sine wave, Hum, Chime, White noise, Boiling

How can I use these tags to only let speech to exist? I already wrote the code that mutes any parts between words that uses timestamps. I tried whisper but it still kept coughing, throat clearing parts.

I tried whisperHallu but that also had some issues cropping some words halfway.

All I need is to keep only the speech parts. After this I will have to figure out a way to remove retakes which sometimes it's one word but sometimes it's half a sentence repeated multiple times but it's always the last one that would be kept.

Any ideas?

dgoryeo · 2024-08-10T20:21:38Z

Hi @orionflame , did you by anychance found a solution to your question?

orionflame · 2024-08-11T16:52:46Z

Hi @orionflame , did you by anychance found a solution to your question?

Unfortunately no. You have any leads.

dgoryeo · 2024-08-11T19:39:08Z

I was wondering if one can distill (527-class AudioSet labels) to much smaller list of events, say less than 10 to be used for this method:

audio_tag_result = whisper.parse_at_label(result, language='follow_asr', top_k=5, p_threshold=-1, include_class_list=list(range(527)))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can this be used to mute non speech parts of an audio? #27

Can this be used to mute non speech parts of an audio? #27

orionflame commented Apr 5, 2024

dgoryeo commented Aug 10, 2024

orionflame commented Aug 11, 2024

dgoryeo commented Aug 11, 2024

Can this be used to mute non speech parts of an audio? #27

Can this be used to mute non speech parts of an audio? #27

Comments

orionflame commented Apr 5, 2024

dgoryeo commented Aug 10, 2024

orionflame commented Aug 11, 2024

dgoryeo commented Aug 11, 2024