You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a lot of narration done by myself for a tutorial that I made so I am trying to clean up the audio files to remove anything non speech related which is majority throat clearing, etc. Here is a very short sample:
I couldn't install this library locally yet due to some dependency errors so I used the huggingface version (time res = 1.6) and got this:
0.0s-6.9s: pretty much everything you could want that occur around the normal vector not
6.9s-13.3s: along it. Keenan Crane is one of the leading
13.3s-17.2s: researchers in computational geometry.
So the first thing that popped up is I said Keenan 3 times which were retakes so they normally shouldn't exist except the last one. You can see this in the audio. Is this library also doing de-duplication of words?
For tags I got these:
0.0s-1.6s: Speech, Narration, monologue, Speech synthesizer, Clicking, Male speech, man speaking
1.6s-3.2s: Speech, Narration, monologue, Speech synthesizer, Clicking, Male speech, man speaking
3.2s-4.8s: Speech, Inside, small room, Clicking, Speech synthesizer, Narration, monologue
4.8s-6.4s: Speech, Narration, monologue, Speech synthesizer, Male speech, man speaking
6.4s-8.0s: Speech, Narration, monologue, Clicking, Speech synthesizer, Inside, small room
8.0s-9.6s: Speech, Clicking, Inside, small room
9.6s-11.2s: Speech, Clicking, Inside, small room, Narration, monologue, Male speech, man speaking
11.2s-12.8s: Speech, Speech synthesizer
12.8s-14.4s: Sine wave
14.4s-16.0s: Sine wave, Hum, Chime, White noise, Boiling
How can I use these tags to only let speech to exist? I already wrote the code that mutes any parts between words that uses timestamps. I tried whisper but it still kept coughing, throat clearing parts.
I tried whisperHallu but that also had some issues cropping some words halfway.
All I need is to keep only the speech parts. After this I will have to figure out a way to remove retakes which sometimes it's one word but sometimes it's half a sentence repeated multiple times but it's always the last one that would be kept.
Any ideas?
The text was updated successfully, but these errors were encountered:
Hi,
I have a lot of narration done by myself for a tutorial that I made so I am trying to clean up the audio files to remove anything non speech related which is majority throat clearing, etc. Here is a very short sample:
https://www.dropbox.com/scl/fi/kotmse874x4rsi86kr8f8/voice3.mp3?rlkey=l5m56g5axort1ru70goo3rvch&dl=1
I couldn't install this library locally yet due to some dependency errors so I used the huggingface version (time res = 1.6) and got this:
0.0s-6.9s: pretty much everything you could want that occur around the normal vector not
6.9s-13.3s: along it. Keenan Crane is one of the leading
13.3s-17.2s: researchers in computational geometry.
So the first thing that popped up is I said Keenan 3 times which were retakes so they normally shouldn't exist except the last one. You can see this in the audio. Is this library also doing de-duplication of words?
For tags I got these:
0.0s-1.6s: Speech, Narration, monologue, Speech synthesizer, Clicking, Male speech, man speaking
1.6s-3.2s: Speech, Narration, monologue, Speech synthesizer, Clicking, Male speech, man speaking
3.2s-4.8s: Speech, Inside, small room, Clicking, Speech synthesizer, Narration, monologue
4.8s-6.4s: Speech, Narration, monologue, Speech synthesizer, Male speech, man speaking
6.4s-8.0s: Speech, Narration, monologue, Clicking, Speech synthesizer, Inside, small room
8.0s-9.6s: Speech, Clicking, Inside, small room
9.6s-11.2s: Speech, Clicking, Inside, small room, Narration, monologue, Male speech, man speaking
11.2s-12.8s: Speech, Speech synthesizer
12.8s-14.4s: Sine wave
14.4s-16.0s: Sine wave, Hum, Chime, White noise, Boiling
How can I use these tags to only let speech to exist? I already wrote the code that mutes any parts between words that uses timestamps. I tried whisper but it still kept coughing, throat clearing parts.
I tried whisperHallu but that also had some issues cropping some words halfway.
All I need is to keep only the speech parts. After this I will have to figure out a way to remove retakes which sometimes it's one word but sometimes it's half a sentence repeated multiple times but it's always the last one that would be kept.
Any ideas?
The text was updated successfully, but these errors were encountered: