sychronizing "reduced speech" #156

stanpetit · 2022-12-11T12:24:02Z

stanpetit
Dec 11, 2022

Hi @sc0ty !
Thanks for this so useful tool.

Could you give some hints to parametrize the case when operating on "reduced speech" ?

During one year, I have downloaded a "popular" french talk-show, where "official specialists" tell "the official truth" about ... any event of public interest.
Now I try to match their "truth" against "the reality" of the events ... and compile extracts in a "time-lapse" of their way to relate "the official truth".

First I parse all the srt files with some regex, producing a list of dicts of interesting parts, [ { text:(re_res, cc_text), video:( v_path, p_hms ) } ,, ]
then from the time code I can jump into videos to the hms point to choose the extracts.
But the time-code is erratic (same for vtt or srt), somehow randomly, with up to +-10 sec offset !
Though it is very difficult to retrieve the precise locations of the interesting parts : useless ...

The CC were provided by the talk-show production at the video download on their replay-platform.
And at each cc_point, the cc_text is a resume of the real chunk of speech.
Whatever it is, each resumed chunk of speech contains a very consistent verbatim part of it's original, and that makes me think / hope that the sync could be successful.

But there are many #synchronization-options ... I feel lost ...
Is there any parameter set with which I could start with to handle this case ?

Thanks again !
Best, Stanislas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sychronizing "reduced speech" #156

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

sychronizing "reduced speech" #156

stanpetit Dec 11, 2022

Replies: 0 comments

stanpetit
Dec 11, 2022