Streaming-LLM? #1348
regularfry
started this conversation in
Ideas
Streaming-LLM?
#1348
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
https://github.com/mit-han-lab/streaming-llm is a neat-looking approach for extending the apparent context length of an LLM, but the way it does it made me think that it might be a better approach to streaming whisper than what's currently in
examples/stream
. It's a moderately invasive update to the model itself, but it looks fairly formulaic.Has anyone got any intuition (or, better, any code) that might inform whether a) it could work for whisper.cpp; and b) might reduce latency overall?
Beta Was this translation helpful? Give feedback.
All reactions