You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When transcribing German audio files with WhisperX and using whisperx.load_align_model, I noticed that dates containing periods (e.g., "Mai.2022") or number containing periods are incorrectly split into separate segments. This occurs because the period is interpreted as the end of a sentence, which leads to inaccurate time alignment and text segmentation.
00:00:59,879 --> 00:01:00,039] Die 5.
[00:01:00,479 --> 00:01:12,850] Strafkammer sah es als erwiesen an, dass der 52-Jährige wissentlich eine verbotene Nazi-Parole bei einer AfD-Veranstaltung im Mai 2021 verbreitet hatte.
[00:14:13,689 --> 00:14:15,992] Und nun die Wettervorhersage für morgen Mittwoch, den 15.
[00:14:16,032 --> 00:14:16,093] Mai.
The text was updated successfully, but these errors were encountered:
sijitang
changed the title
Issue with Periods in Dates or number Causing Incorrect Segment Splitting in German Transcriptions
Issue with Periods in Dates or numbers Causing Incorrect Segment Splitting in German Transcriptions
May 26, 2024
When transcribing German audio files with WhisperX and using
whisperx.load_align_model
, I noticed that dates containing periods (e.g., "Mai.2022") or number containing periods are incorrectly split into separate segments. This occurs because the period is interpreted as the end of a sentence, which leads to inaccurate time alignment and text segmentation.00:00:59,879 --> 00:01:00,039] Die 5.
[00:01:00,479 --> 00:01:12,850] Strafkammer sah es als erwiesen an, dass der 52-Jährige wissentlich eine verbotene Nazi-Parole bei einer AfD-Veranstaltung im Mai 2021 verbreitet hatte.
[00:14:13,689 --> 00:14:15,992] Und nun die Wettervorhersage für morgen Mittwoch, den 15.
[00:14:16,032 --> 00:14:16,093] Mai.
The text was updated successfully, but these errors were encountered: