- Clone this repository
git clone https://github.com/continuedev/ggml-server-example
- Move into the folder:
cd ggml-server-example
- Create a virtual environment:
python3 -m venv env
- Activate the virtual environment:
source env/bin/activate
on Mac,env\Scripts\activate.bat
on Windows,source env/bin/activate.fish
if using fish terminal - Install required packages:
pip install -r requirements.txt
- Download a model to the
models/
folder- Here is a convenient source of models that can be downloaded: https://huggingface.co/TheBloke
- For example, download 4-bit quantized WizardLM-7B from here (we recommend this model): https://huggingface.co/TheBloke/wizardLM-7B-GGML/blob/main/wizardLM-7B.ggmlv3.q4_0.bin
- Run the server with
python3 -m llama_cpp.server --model models/wizardLM-7B.ggmlv3.q4_0.bin
- To set this as your default model in Continue, you can open
~/.continue/config.json
either manually or using the/config
slash command in Continue. Then, import theGGML
class (from continuedev.src.continuedev.libs.llm.ggml import GGML
), set"default_model": "default=GGML(max_context_length=2048)"
, reload your VS Code window, and you're good to go!
Happy to help. Email use at hi@continue.dev.