Skip to content

Commit

Permalink
Added max_new_tokens as a config option to llm yaml block (zylon-ai#1317
Browse files Browse the repository at this point in the history
)

* added max_new_tokens as a configuration option to the llm block in settings

* Update fern/docs/pages/manual/settings.mdx

Co-authored-by: lopagela <lpglm@orange.fr>

* Update private_gpt/settings/settings.py

Add default value for max_new_tokens = 256

Co-authored-by: lopagela <lpglm@orange.fr>

* Addressed location of docs comment

* reformatting from running 'make check'

* remove default config value from settings.yaml

---------

Co-authored-by: lopagela <lpglm@orange.fr>
  • Loading branch information
gianniacquisto and lopagela authored Nov 26, 2023
1 parent baf29f0 commit 9c192dd
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 0 deletions.
15 changes: 15 additions & 0 deletions fern/docs/pages/installation/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,21 @@ Currently, not all the parameters of `llama.cpp` and `llama-cpp-python` are avai
In case you need to customize parameters such as the number of layers loaded into the GPU, you might change
these at the `llm_component.py` file under the `private_gpt/components/llm/llm_component.py`.

##### Available LLM config options

The `llm` section of the settings allows for the following configurations:

- `mode`: how to run your llm
- `max_new_tokens`: this lets you configure the number of new tokens the LLM will generate and add to the context window (by default Llama.cpp uses `256`)

Example:

```yaml
llm:
mode: local
max_new_tokens: 256
```
If you are getting an out of memory error, you might also try a smaller model or stick to the proposed
recommended models, instead of custom tuning the parameters.
Expand Down
1 change: 1 addition & 0 deletions private_gpt/components/llm/llm_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ def __init__(self, settings: Settings) -> None:
self.llm = LlamaCPP(
model_path=str(models_path / settings.local.llm_hf_model_file),
temperature=0.1,
max_new_tokens=settings.llm.max_new_tokens,
# llama2 has a context window of 4096 tokens,
# but we set it lower to allow for some wiggle room
context_window=3900,
Expand Down
4 changes: 4 additions & 0 deletions private_gpt/settings/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,10 @@ class DataSettings(BaseModel):

class LLMSettings(BaseModel):
mode: Literal["local", "openai", "sagemaker", "mock"]
max_new_tokens: int = Field(
256,
description="The maximum number of token that the LLM is authorized to generate in one completion.",
)


class VectorstoreSettings(BaseModel):
Expand Down

0 comments on commit 9c192dd

Please sign in to comment.