Export to srt file #93

jooni22 · 2023-03-26T16:11:38Z

jooni22
Mar 26, 2023

Hi, really great implementation, congratulations. The only thing I missed was saving to a format such as srt or vtt, I decided to add such a function for myself. I'm not a programmer, that's why I threw everything in as code that runs your implementation. Maybe someday someone will ask about it then you will have a link to the finished solution, or you can always add it to the README. You can close the issue right away, there is no reason for it to be open, your repository lacks a "Discussion" section. Thanks a lot, regards.

from faster_whisper import WhisperModel
import math

def convert_seconds_to_hms(seconds):
    hours, remainder = divmod(seconds, 3600)
    minutes, seconds = divmod(remainder, 60)
    milliseconds = math.floor((seconds % 1) * 1000)
    output = f"{int(hours):02}:{int(minutes):02}:{int(seconds):02},{milliseconds:03}"
    return output

model_path = "whisper-large-v2-ct2/"
# Run on GPU with FP16
model = WhisperModel(model_path, device="cuda", compute_type="float16")
# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
#model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe("file.mp4", beam_size=5)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
count = 0
with open('file.srt', 'w') as f:  # Open file for writing
    for segment in segments:
        count +=1
        duration = f"{convert_seconds_to_hms(segment.start)} --> {convert_seconds_to_hms(segment.end)}\n"
        text = f"{segment.text.lstrip()}\n\n"
        
        f.write(f"{count}\n{duration}{text}")  # Write formatted string to the file
        print(f"{duration}{text}",end='')

guillaumekln · 2023-03-27T12:00:57Z

guillaumekln
Mar 27, 2023

Thank you, this is useful. People will be able to find this issue by searching for "srt".

1 reply

Krathlusker Apr 2, 2023

If anyone is interested, haha, me and ChatGPT-3 wrote a script that does the same. Its for YT-vids. works like this = python3 script.py --video "LINK TO VIDEO HERE"

It spits out a .srt file, that only needs you to use Subtitle-Edit to fix common errors, and split long lines. Works like a charm.

phineas-pta · 2023-04-10T21:14:58Z

phineas-pta
Apr 10, 2023

let's combine it with a progress bar from #80

import faster_whisper
import math
from tqdm import tqdm

model = faster_whisper.WhisperModel("large-v2", device="cuda")

def convert_to_hms(seconds: float) -> str:
    hours, remainder = divmod(seconds, 3600)
    minutes, seconds = divmod(remainder, 60)
    milliseconds = math.floor((seconds % 1) * 1000)
    output = f"{int(hours):02}:{int(minutes):02}:{int(seconds):02},{milliseconds:03}"
    return output

def convert_seg(segment: faster_whisper.transcribe.Segment) -> str:
    return f"{convert_to_hms(segment.start)} --> {convert_to_hms(segment.end)}\n{segment.text.lstrip()}\n\n"

segments, info = model.transcribe("file.mp4")

full_txt = []
timestamps = 0.0  # for progress bar
with tqdm(total=info.duration, unit=" audio seconds") as pbar:
    for i, segment in enumerate(segments, start=1):
        full_txt.append(f"{i}\n{convert_seg(segment)}")
        pbar.update(segment.end - timestamps)
        timestamps = segment.end
    if timestamps < info.duration: # silence at the end of the audio
        pbar.update(info.duration - timestamps)

with open("file.srt", mode="w", encoding="UTF-8") as f:
    f.writelines(full_txt)

0 replies

Ayanaminn · 2023-04-15T06:07:34Z

Ayanaminn
Apr 15, 2023

Another method using the pysubs2 library:

from faster_whisper import WhisperModel
import pysubs2

model = WhisperModel(model_size = 'large-v2')
segments, _ = model.transcribe(audio='audio.mp3')

# to use pysubs2, the argument must be a segment list-of-dicts
results= []
for s in segments:
    segment_dict = {'start':s.start,'end':s.end,'text':s.text}
    results.append(segment_dict)

subs = pysubs2.load_from_whisper(results)
#save srt file
subs.save(file_name+'.srt')
#save ass file
subs.save(file_name+'.ass')

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export to srt file #93

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Export to srt file #93

jooni22 Mar 26, 2023

Replies: 3 comments · 1 reply

guillaumekln Mar 27, 2023

Krathlusker Apr 2, 2023

phineas-pta Apr 10, 2023

Ayanaminn Apr 15, 2023

jooni22
Mar 26, 2023

Replies: 3 comments 1 reply

guillaumekln
Mar 27, 2023

phineas-pta
Apr 10, 2023

Ayanaminn
Apr 15, 2023