Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TriePagedAttentionCache #632

Merged
merged 23 commits into from
Dec 4, 2024
Merged

TriePagedAttentionCache #632

merged 23 commits into from
Dec 4, 2024

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Dec 2, 2024

feat: Add TriePagedAttentionCache with initial implementation

Added TriePagedAttentionCache as an optional prefix sharing algorithm, selectable via:
config["paged_kv_cache"]["prefix_sharing_algorithm"] = "trie"

Current Status:

  • Basic implementation and unit tests complete
  • Integration test cases for both Base and Trie implementations, with trie implementation xfailed due to pending cache allocation improvements
  • BasePagedAttentionCache remains the default

Next Steps:
To achieve full functionality, we need to support cache re-allocations to extend the associated tokens & pages.

@renxida renxida marked this pull request as ready for review December 2, 2024 22:24
@renxida renxida requested a review from stbaione December 2, 2024 22:24
@renxida
Copy link
Contributor Author

renxida commented Dec 2, 2024

image

: D

Copy link
Contributor

@stbaione stbaione left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could benefit from another set of eyes, but the overall interface and operations make sense to me

@renxida renxida enabled auto-merge (squash) December 4, 2024 04:56
@renxida renxida merged commit de4d2fe into nod-ai:main Dec 4, 2024
13 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants