Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortfin Tasks tracker #647

Open
1 of 21 tasks
renxida opened this issue Dec 4, 2024 · 0 comments
Open
1 of 21 tasks

Shortfin Tasks tracker #647

renxida opened this issue Dec 4, 2024 · 0 comments

Comments

@renxida
Copy link
Contributor

renxida commented Dec 4, 2024

  • stress testing shortfin
    • prereqs
      - [ ] figure out interface for specifying cache management algorithm (trie or base) Ideally we can even hotswap it by specifying in the incoming http request.
      • refactor CI to be able to reuse the same model artifacts between Base and Trie
    • tests needed
      • test models
        • start with toy llama model that Rob has
      • test sequences
        • repeat 100 x the same prompt. Ran on both Base and Trie. Trie should be close to 100x faster by skipping prefill. If i screwed up the cache matching, trie would be slower.
        • prompts forking at various locations
      • things to track over all test cases
        • output token consistency between base and trie
        • performance comparison between base and trie.
          • total time between sending first request and receiving output of last request
          • timeline of sending & receiving requests. This should be helpful for tracking performance problems down the line
  • sharding
    • GPU first
      • sharding is not useful on CPU & in the past we've encountered problems unique to CPU. If we're trying to make GPU work there's not much reason to wade through those. If we are stuck with GPU specific issues & there is no more important work to do, THEN we should try sharding on CPU
  • ux improvements
  • [ ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant