BigDL-LLM is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency (for any PyTorch model).
The integration with BigDL-LLM currently only supports running on Intel CPU.
Please follow setup.md to setup the environment first. Additional, you will need to install bigdl dependencies as below.
pip install .[bigdl-cpu] -f https://developer.intel.com/ipex-whl-stable-cpu -f https://download.pytorch.org/whl/torch_stable.html
Please follow the serving document for configuring the parameters. In the configuration file, you need to set bigdl
and load_in_4bit
to true. Example configuration files for enalbing bigdl-llm are availabe [here].(../inference/models/bigdl)
bigdl: true
config:
load_in_4bit: true
Please follow the serving document for deploying and testing.