-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shortfin LLM Docs #481
Shortfin LLM Docs #481
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start, thanks!
Slight adjustments to the user e2e doc
Removing markdown linter. Pre-existing md files don't pass, and would clutter the PR to change all of them |
Remove markdownlint pre-commit step
Remove references to `$PORT` env var, Comment TODO's, so that they only appear in raw view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement. We can iterate on it after testing. Can go ahead and land it.
Description
The following docs outline how to export, and compile a Llama 8b f16 decomposed model, then run the Shortfin LLM Server with the the compiled model.
It includes docs for both a
developer
flow and auser
flow.There are a couple
TODOs
that can be updated/fixed as we make patches in shortfin and/or sharktank.