Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The incomplete Python program "richerSentenceWords.py" is an enhanced version of "sentencesToTranscripts.py;" the former takes an additional Google/YouTube word-by-word CC SRT transcript as supplemental input. "richerSentenceWords.py" matches the YouTube SRT to the /SENTENCES output from OpenAI/Whisper. In case of discrepancy, it often uses the YouTube reading, which is often more accurate that Whisper's guess. In addition, Whisper suppresses time-fillers like "um," "uh," and some instances of "you know," "I think," and repetitions; no such suppression occurs in auto-generated YouTube CCs. Per Active Inference Journal convention, these superficially unimportant vocal gestures are to be: (a) retained in the Gold Standard version of each audiovisual session; (b) in SRT closed captioning these suppressed phrases are enclosed in {curly braces}; (c) in "printable journal" articles are deleted.
- Loading branch information