Releases · LostRuins/koboldcpp

05 Apr 08:16

v1.0.9beta

1490cdd

koboldcpp-1.0.9beta

koboldcpp-1.0.9beta

Integrated support for GPT2! This also should theoretically work with Cerebras models, but I have not tried those yet. This is a great way to get started as now you can try models so tiny even a potato CPU can run them. Here's a good one to start with: https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin with which I can generate 100 tokens in a second.
Upgraded embedded Kobold Lite to support a Stanford Alpaca compatible Instruct Mode, which can be enabled in settings.
Removed all -march=native and -mtune=native flags when building the binary. Compatibility should be more consistent with different devices now.
Fixed an incorrect flag name used to trigger the ACCELERATE library for mac OSX. This should give you greatly increased performance of OSX users for GPT-J and GPT2 models, assuming you have ACCELERATE support.
Added Rep Pen for GPT-J and GPT-2 models, and by extension pyg.cpp, this means that repetition penalty now works similar to the way it does in llama.cpp.

To use, download and run the koboldcpp.exe
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

03 Apr 03:58

LostRuins

v1.0.8beta

eb5b22d

koboldcpp-1.0.8beta

koboldcpp-1.0.8beta

Rebranded to koboldcpp (formerly llamacpp-for-kobold). Library file names and references are changed too, Please let me know if anything is broken!
Added support for the original GPT4ALL.CPP format!
Added support for GPT-J formats, including the original 16bit legacy format as well as the 4bit version from Pygmalion.cpp
Switched compiler flag from -O3 to -Ofast. This should increase generation speed even more, but I dunno if anything will break, please let me know if so.
Changed default threads to scale according to physical Core counts instead of os.cpu_count(). This will generally result in fewer threads being utilized, but it should provide a better default for slower systems. You can override this manually with --threads parameter.

To use, download and run the koboldcpp.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

0 Join discussion

01 Apr 01:11

LostRuins

v1.0.7

9ab6e87

llamacpp-for-kobold-1.0.7

llamacpp-for-kobold-1.0.7

Added support for new version of the ggml llamacpp model format (magic=ggjt, version 3). All old versions will continue to be supported.
Integrated speed improvements from parent repo.
Fixed an encoding issue with utf-8 in the outputs.
Improved console debug information during generation, now shows token progress and time taken directly.
Set non-streaming to be the default mode. You can enable streaming with --stream

To use, download and run the llamacpp-for-kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

29 Mar 17:13

LostRuins

v1.0.6beta

d8febc8

llamacpp-for-kobold-1.0.6-beta

llamacpp-for-kobold-1.0.6-beta

This is an experimental release containing new integrations for OpenBLAS, which should increase initial prompt processing speed on compatible systems by over 2 times!
Updated Embedded Kobold Lite with the latest version which supports pseudo token streaming. This should make the UI feel much more responsive during prompt generation.
Switched to argparse, you can view all command line flags with llamacpp-for-kobold.exe --help
To disable OpenBLAS, you can run it with --noblas. Please tell me if you have issues with it, and include which specific OS and platform.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

2 Join discussion

25 Mar 03:29

LostRuins

v1.0.5

8a339bd

llamacpp-for-kobold-1.0.5

llamacpp-for-kobold-1.0.5

Merged the upstream fixes for 65b
Clamped max thread count to 4, it actually provides better results as it is memory bottlenecked.
Added support for select kv data type, defaulting to f32 instead of f16
Added more default build flags
Added softprompts endpoint

To use, download and run the llamacpp_for_kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

24 Mar 14:22

LostRuins

v1.0.4

e791827

llamacpp-for-kobold-1.0.4

llamacpp-for-kobold-1.0.4

Added a script to make standalone pyinstaller .exes, which will be used for all future releases. The llamacpp.dll and llama-for-kobold.py files are still available by cloning the repo and will be included and updated there.
Added token caching for prompts, allowing fast forwarding through partially duplicated prompts. This make edits towards the end of the previous prompt much faster.
Merged improvements from parent repo.
Weights not included.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 4

22 Mar 14:50

LostRuins

v1.0.3

4ff58f7

llamacpp-for-kobold-1.0.3

llamacpp-for-kobold-1.0.3

Applied the massive refactor from the parent repo. It was a huge pain but I managed to keep the old tokenizer untouched and retained full support for the original model formats.
Reduced default batch sizes greatly, as large batch sizes were causing bad output and high memory usage
Support dynamic context lengths sent from client.
TavernAI is working although I wouldn't recommend it, they spam the server with multiple requests of huge contexts so you're going to have a very painful time getting responses.

Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

21 Mar 13:19

LostRuins

v1.0.2

a1625c4

llamacpp-for-kobold-1.0.2

llamacpp-for-kobold-1.0.2

Added an embedded version of Kobold Lite inside (AGPL Licensed)
Updated to new ggml model format, but still maintain support for the old one and the old tokenizer.
Changed license to AGPL v3. The original GGML library and llama.cpp are still under MIT license in their original repos.

Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

20 Mar 07:05

LostRuins

v1.0.1

dda69d4

llamacpp-for-kobold-1.0.1

Bugfixes for OSX, and KV caching allows continuing a previous generation without reprocessing the whole prompt
Weights not included.

To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
https://lite.koboldai.net/?local=1&port=5001

Assets 3

18 Mar 17:10

LostRuins

v1.0.0

c21c89e

llamacpp-for-kobold-1.0.0

llamacpp-for-kobold-1.0.0
Initial version
Weights not included.

To use, download, extract and run:
llama_for_kobold.py [ggml_quant_model.bin] [port]

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: LostRuins/koboldcpp

koboldcpp-1.0.9beta

koboldcpp-1.0.8beta

llamacpp-for-kobold-1.0.7

llamacpp-for-kobold-1.0.6-beta

llamacpp-for-kobold-1.0.5

llamacpp-for-kobold-1.0.4

llamacpp-for-kobold-1.0.3

llamacpp-for-kobold-1.0.2

llamacpp-for-kobold-1.0.1

llamacpp-for-kobold-1.0.0