Skip to content

Release v3.0.0

Latest
Compare
Choose a tag to compare
@ScottTodd ScottTodd released this 06 Nov 19:13
· 88 commits to main since this release
4770759

This release marks public availability for the SHARK AI project, with a focus on serving the Stable Diffusion XL model on AMD Instinct™ MI300X Accelerators.

Highlights

shark-ai

The shark-ai package is the recommended entry point to using the project. This meta package includes compatible versions of all relevant sub-projects.

shortfin

The shortfin sub-project is SHARK's high performance inference library and serving engine.

Key features:

  • Fast inference using ahead of time model compilation powered by IREE
  • Throughput optimization via request batching and support for flexible device topologies
  • Asynchronous execution and efficient threading
  • Example applications for supported models
  • APIs available in Python and C
  • Detailed profiling support

For this release, shortfin uses precompiled programs built by the SHARK team using the sharktank sub-project. Future releases will streamline the model conversion process, add user guides, and enable adventurous users to bring their own custom models.

Current shortfin system requirements:

Serving Stable Diffusion XL (SDXL) on MI300X

See the user guide for the latest instructions.

To serve the Stable Diffusion XL model, which generates output images given input text prompts:

# Set up a Python virtual environment.
python -m venv .venv
source .venv/bin/activate
# Optional: faster installation of torch with just CPU support.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install shark-ai, including extra dependencies for apps.
pip install shark-ai[apps]

# Start the server then wait for it to download artifacts.
python -m shortfin_apps.sd.server \
  --device=amdgpu --device_ids=0 --topology="spx_single" \
  --build_preference=precompiled
# (wait for setup to complete)
# INFO - Application startup complete.
# INFO - Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

# Run the interactive client, sending text prompts and receiving generated images back.
python -m shortfin_apps.sd.simple_client --interactive
# Enter a prompt: a single cybernetic shark jumping out of the waves set against a technicolor sunset
# Sending request with prompt: ['a single cybernetic shark jumping out of the waves set against a technicolor sunset']
# Sending request batch # 0
# Saving response as image...
# Saved to gen_imgs/shortfin_sd_output_2024-11-15_16-30-30_0.png

shortfin_sd_output_2024-11-18_11-49-24_0_resize300

Roadmap

This release is just the start of a longer journey. The SHARK platform is fully open source, so stay tuned for future developments. Here is a taste of what we have planned:

  • Support for a wider range of ML models, including popular LLMs
  • Performance improvements and optimized implementations for supported models across a wider range of devices
  • Integrations with other popular frameworks and APIs
  • General availability and user guides for the sharktank model development toolkit