-
Shouldn't these options explicitly probhit a 20 second silence in mid-line? Especially if the line is only 25 chars / about 5 words anyway?! What am I doing wrong?
p.s. also is there a way to output both LRC and SRT at the same time, because i need both, don't want it run it twice, and i don't trust the converter i wrote. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 26 replies
-
I also get into ... a single word left on the screen for 60 seconds during a guitar solo. It's like Whisper thinks somebody has a VERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRY legato voice 😂 |
Beta Was this translation helpful? Give feedback.
-
No.
You are touching vad settings, don't touch them. You can try
|
Beta Was this translation helpful? Give feedback.
-
I've now run this through several hundred songs...
After a lot of trial and error, I'm up to the 20th iteration of my prompt, frequently trying multiple interactions on one file to see what's best... ..And what's best for me is the settings I have currently. I just don't understand why there can be a 30 second silence in the same subtitle. |
Beta Was this translation helpful? Give feedback.
-
Accuracy of transcription is not "pointless". I don't know why you're so emotional about this and turned it into a philosophical discussion. It was a technical question about command line parameters and how to not have a 30 second silence in mid-caption. Technical: If i turned on word-timestamps, wouldn't it know the words are 30 seconds apart and be able to separate them? In the same way that whisper-large-v3 isn't as good as whisper-large-v2, the model you use depends on the data given. Whisper-large-v3 also has hallucations during silence from improperly-imported youtube subtitles that weren't actually spoken in the videos they used as training data. Philosophical: People and data aren't perfect. That's why we need options and custom solutions. That's why I have to roll my own custom solution when nobody else's is quite right for me. Been doing that since the 1980s. Technical: And clearly with pyannote, they did not include the types of music i listen to in their test data. It's awful for heavy music. Philosophical: Peoples' lived experiences count. |
Beta Was this translation helpful? Give feedback.
-
Surely, if word timestamps were on, there would or could be a parameter to separate 2 words that are a certain number of seconds apart into separate subtitles..... I wish I'd thought of this when I asked the question, it's a good idea. |
Beta Was this translation helpful? Give feedback.
-
Just FYI, the max-gap options seemed to help the silences between words, but what REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY helped having lyrics properly split as they are sung? Adding a period to the end of EVERY line of a downloaded lyric file, even if it doesn't make grammatical sense. It was my wife's idea. "Why not add invisible periods?" she said. "No such thing, how dumb!" i said, then immediately realized it's brilliance. They can be removed afterward, thus technically "invisible". Usually the line breaks in posted lyrics are good points to consider stopping a subtitle Combined with --sentence, this fixed my problems with long captoins. Words now spit out as they are sung. This was NOT the case with the EXACT SAME options but me putting a comma at the ends of lines instead of periods. So, I ended up writing a postprocessor in perl to remove all the periods at the end afterward. |
Beta Was this translation helpful? Give feedback.
No.
You are touching vad settings, don't touch them. You can try
--vad_alt_method pyannote_v3
--output_format all