Step-by-step: run local models with GGML (~5min + download time for model weights)

Clone this repository git clone https://github.com/continuedev/ggml-server-example
Move into the folder: cd ggml-server-example
Create a virtual environment: python3 -m venv env
Activate the virtual environment: source env/bin/activate on Mac, env\Scripts\activate.bat on Windows, source env/bin/activate.fish if using fish terminal
Install required packages: pip install -r requirements.txt

Download a model to the models/ folder
- Here is a convenient source of models that can be downloaded: https://huggingface.co/TheBloke
- For example, download 4-bit quantized WizardLM-7B from here (we recommend this model): https://huggingface.co/TheBloke/wizardLM-7B-GGML/blob/main/wizardLM-7B.ggmlv3.q4_0.bin

Run the server with python3 -m llama_cpp.server --model models/wizardLM-7B.ggmlv3.q4_0.bin

To set this as your default model in Continue, you can open ~/.continue/config.json either manually or using the /config slash command in Continue. Then, import the GGML class (from continuedev.src.continuedev.libs.llm.ggml import GGML), set "default_model": "default=GGML(max_context_length=2048)", reload your VS Code window, and you're good to go!

Any questions?

Happy to help. Email use at hi@continue.dev.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt