Important
Development is still in progress for several project components. See the notes below for which workflows are best supported.
The shortfin sub-project is SHARK's high performance inference library and serving engine.
- API documentation for shortfin is available on readthedocs.
The SHARK Tank sub-project contains a collection of model recipes and conversion tools to produce inference-optimized programs.
Warning
SHARK Tank is still under development. Experienced users may want to try it out, but we currently recommend most users download pre-exported or pre-compiled model files for serving with shortfin.
- See the SHARK Tank Programming Guide for information about core concepts, the development model, dataset management, and more.
- See Direct Quantization with SHARK Tank for information about quantization support.
The Tuner sub-project assists with tuning program performance by searching for optimal parameter configurations to use during model compilation.
Model name | Model recipes | Serving apps |
---|---|---|
SDXL | sharktank/sharktank/models/punet/ |
shortfin/python/shortfin_apps/sd/ |
llama | sharktank/sharktank/models/llama/ |
shortfin/python/shortfin_apps/llm/ |
If you're looking to use SHARK check out our User Guide.
If you're looking to develop SHARK, check out our Developer Guide.