Add files via upload · ActiveInferenceInstitute/Journal-Utilities@e7cdbfb

Commit

Add files via upload

The incomplete Python program "richerSentenceWords.py" is an enhanced version of "sentencesToTranscripts.py;" the former takes an additional Google/YouTube word-by-word CC SRT transcript as supplemental input.
"richerSentenceWords.py" matches the YouTube SRT to the /SENTENCES output from OpenAI/Whisper. In case of discrepancy, it often uses the YouTube reading, which is often more accurate that Whisper's guess. 
In addition, Whisper suppresses time-fillers like "um," "uh," and some instances of "you know," "I think," and repetitions; no such suppression occurs in auto-generated YouTube CCs. Per Active Inference Journal convention, these superficially unimportant vocal gestures are to be: (a) retained in the Gold Standard version of each audiovisual session; (b) in SRT closed captioning these suppressed phrases are enclosed in {curly braces}; (c) in "printable journal" articles are deleted.

Loading branch information

BazookamanPH authored Nov 28, 2022

1 parent 661b8ce commit e7cdbfb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `e7cdbfb`

Commit

There are no files selected for viewing

0 comments on commit e7cdbfb

0 comments on commit `e7cdbfb`