Fast forward through non speaking segments of videos with voice activity detection instead of volume detection.
The issue with most Unsilencers is they rely on sound volume not speach detection. VAD filters solve for this.
Using fastforward instead of clipping out segments prevents popping and clicks between transitions.
Silero VAD - pre-trained enterprise-grade Voice Activity Detector (also see our STT models).
https://github.com/samfisherirl/Unsilence_GUI/releases/download/v1/unsilencer.zip