Skip to content

v6.9.0 - Welcome Kobold!!

Compare
Choose a tag to compare
@BBC-Esq BBC-Esq released this 14 Oct 03:20
· 93 commits to main since this release
a78b4f3

Welcome Kobold edition

Ask Jeeves!

  • Exciting new "Ask Jeeves" helper who answers questions about how to use the program. Simply click "Jeeves" in the upper left.
  • "Jeeves" gets his knowledge from a vector database that comes shipped with this release! NO MORE USER GUIDE TAB - just ASK JEEVES!
    • IMPORTANT: After running setup_windows.py you must go into the Assets folder, right-click on koboldcpp_nocuda.exe, and check the "Unblock" checkbox first! If it's not there, try starting Jeeves and see if it works. Create a Github Issue if it doesn't work because Ask Jeeves is a new feature.
    • IMPORTANT: You may also need to disable or make an exception for any firewall you have. Submit a Github Issue if you encounter any problems.

Scrape Python Library Documentation

  • In the Tools Tab, simply select a python library, click Scrape, and all the .html files will be downloaded to the Scraped_Documentation folder.
  • Create a vector database out of all of the .html files for a given library, then use one of the coding specific models to answer questions!

Huggingface Access Token

  • You can now enter an "access token" and access models that are "gated" on huggingface. Currently, llama 3.2 - 3b and mistral-small - 22b are the only gated models.
  • Ask Jeeves how to get a huggingface access token.

Other Improvements

  • The vector models are now downloaded using the snapshot_download functionality from huggingface_hub, which can exclude unnecessary files such as onnx, .bin (when an equivalent .safetensors version is available), and others. This significantly reduces the amount of data that this program downloads and therefore increases speed and usability.
  • This speedup should pertain to vector, chat, and whisper models, and implementing the snapshot_download for TTS models is planned.
  • New Compare GPUs button in the Tools Tab, which displays metrics for various GPUs so you can better determine your settings. Charts and graphs for chat/vision models will be added in the near future.
  • New metrics bar with speedometer-looking widgets.
  • Removed the User Guide Tab altogether to free up space. You can now simply Ask Jeeves instead.
  • Lots and lots of refactoring to improve various things...

Added/Removed Chat Models

  • Added Qwen 2.5 - 1.5b, Llama 3.2 - 3b, Internlm 2.5 - 1.8b, Dolphin-Llama 3.1 - 8b, Mistral-Small - 22b.
  • Removed Longwriter Llama 3.1 - 8b, Longwriter GLM4 - 9b, Yi - 9b, Solar Pro Preview - 22.1b.

Added/Removed Vision Models

  • Removed Llava 1.5, Bakllava, Falcon-vlm - 11b, and Phi-3-Vision models as either under-performing or eclipsed by pre-existing models that have additional benefits.

Roadmap

  • Add Kobold as a backend in addition to LM Studio and Local Models, at which point I'll probably have to rename this github repo.
  • Add OpenAI backend.
  • Remove LM Studio Server settings and revise instructions since LM Studio has changed significantly since they were last done.

Full Changelog: v6.8.2...v6.9.0