Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update README on preparing quantized model #19

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,10 @@ $ gem install llama_cpp -- --with-opt-dir=/opt/homebrew
## Usage

Prepare the quantized model by refering to [the usage section on the llama.cpp README](https://github.com/ggerganov/llama.cpp#usage).
For example, preparing the quatization model based on [open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b) is as follows:
For example, you could prepare the quatization model based on
[open_llama_7b](https://huggingface.co/openlm-research/open_llama_7b)
or more useful in the context of Ruby might be a smaller model such as
[tiny_llama_1b](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0):

```sh
$ cd ~/
Expand All @@ -42,11 +45,12 @@ $ git clone https://github.com/ggerganov/llama.cpp.git
$ cd llama.cpp
$ python3 -m pip install -r requirements.txt
$ cd models
$ git clone https://huggingface.co/openlm-research/open_llama_7b
$ git clone https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
$ cd ../
$ python3 convert.py models/open_llama_7b
$ python3 convert-hf-to-gguf.py models/TinyLlama-1.1B-Chat-v1.0
$ make
$ ./quantize ./models/open_llama_7b/ggml-model-f16.gguf ./models/open_llama_7b/ggml-model-q4_0.bin q4_0
$ ./llama-quantize ./models/TinyLlama-1.1B-Chat-v1.0/ggml-model-f16.gguf \
./models/TinyLlama-1.1B-Chat-v1.0/ggml-model-q4_0.bin q4_0
```

An example of Ruby code that generates sentences with the quantization model is as follows:
Expand All @@ -55,7 +59,10 @@ An example of Ruby code that generates sentences with the quantization model is
require 'llama_cpp'

model_params = LLaMACpp::ModelParams.new
model = LLaMACpp::Model.new(model_path: '/home/user/llama.cpp/models/open_llama_7b/ggml-model-q4_0.bin', params: model_params)
model = LLaMACpp::Model.new(
model_path: '/home/user/llama.cpp/models/TinyLlama-1.1B-Chat-v1.0/ggml-model-q4_0.bin',
params: model_params
)

context_params = LLaMACpp::ContextParams.new
context_params.seed = 42
Expand Down
Loading