-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KV Cache] Overwrite Cache - SW Attention #297
Conversation
src/runtime/relax_vm/lm_support.cc
Outdated
@@ -159,6 +232,7 @@ class AttentionKVCache : public ObjectRef { | |||
n->Append(init_data); | |||
if (init_fill_count >= 0) { | |||
n->fill_count = init_fill_count; | |||
n->current_pos = init_fill_count; // sliding window attention only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
window_attention_current_pos
src/runtime/relax_vm/lm_support.cc
Outdated
* \brief Append value to the cache. | ||
* \param value The value to overwrite previous elements. | ||
*/ | ||
void Overwrite(NDArray value, int64_t max_cache_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WindowOverride
Thanks @davidpissarra . The main comment is that given the impl is specialized to window, let us make sure tha API naming highlights the fact |
please also send the PR to unity branch of https://github.com/apache/tvm |
Part of the effort on Sliding Window Attention (SWA) mlc-ai/mlc-llm#1003. Overwriting the cache is useful when computing SWA, so we can have a more efficient cache only containing the current window keys and values. Once the cache is full we start overwriting the older entries.