Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
The incomplete Python program "richerSentenceWords.py" is an enhanced version of "sentencesToTranscripts.py;" the former takes an additional Google/YouTube word-by-word CC SRT transcript as supplemental input.
"richerSentenceWords.py" matches the YouTube SRT to the /SENTENCES output from OpenAI/Whisper. In case of discrepancy, it often uses the YouTube reading, which is often more accurate that Whisper's guess. 
In addition, Whisper suppresses time-fillers like "um," "uh," and some instances of "you know," "I think," and repetitions; no such suppression occurs in auto-generated YouTube CCs. Per Active Inference Journal convention, these superficially unimportant vocal gestures are to be: (a) retained in the Gold Standard version of each audiovisual session; (b) in SRT closed captioning these suppressed phrases are enclosed in {curly braces}; (c) in "printable journal" articles are deleted.
  • Loading branch information
BazookamanPH authored Nov 28, 2022
1 parent 661b8ce commit e7cdbfb
Showing 1 changed file with 1,560 additions and 0 deletions.
Loading

0 comments on commit e7cdbfb

Please sign in to comment.