Skip to content

Latest commit

 

History

History
22 lines (17 loc) · 1.1 KB

serve_bigdl.md

File metadata and controls

22 lines (17 loc) · 1.1 KB

Deploying and Serving LLMs with BigDL-LLM

BigDL-LLM is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency (for any PyTorch model).

The integration with BigDL-LLM currently only supports running on Intel CPU.

Setup

Please follow setup.md to setup the environment first. Additional, you will need to install bigdl dependencies as below.

pip install .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html

Configure Serving Parameters

Please follow the serving document for configuring the parameters. In the configuration file, you need to set bigdl and load_in_4bit to true. Example configuration files for enalbing bigdl-llm are availabe [here].(../inference/models/bigdl)

  bigdl: true
  config:
    load_in_4bit: true

Deploy and Test

Please follow the serving document for deploying and testing.