We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我是最近才开始做视频,其实我不知道该不该一字一断,但是视频里面有些小错误我想去掉,想偷懒,不知道可不可以把模型设成一字一断或者一词一段?
如果没有的话,我大致有个思路,先按照正常的长度去识别,再把句子断开,最后把断开的句子和音频再匹配出时间。我挺想把这个想法实现一下,不知道有没有这个必要。
The text was updated successfully, but these errors were encountered:
whisper新出的API,支持word-level.
Sorry, something went wrong.
https://github.com/linto-ai/whisper-timestamped this one has already implemented it, so wait for any contributor to work on its adaptation @mli @yihong0618 @zcf0508
Also this https://github.com/m-bain/whisperX for reference
感觉目前这类工具的顶点就是剪映的智能剪口播,那个字级剪辑是真的很好用。期待此项目后续能支持这个功能。
No branches or pull requests
我是最近才开始做视频,其实我不知道该不该一字一断,但是视频里面有些小错误我想去掉,想偷懒,不知道可不可以把模型设成一字一断或者一词一段?
如果没有的话,我大致有个思路,先按照正常的长度去识别,再把句子断开,最后把断开的句子和音频再匹配出时间。我挺想把这个想法实现一下,不知道有没有这个必要。
The text was updated successfully, but these errors were encountered: