Releases: BBC-Esq/VectorDB-Plugin-for-LM-Studio
v6.7.0 - LONG CONTEXT no see!
General Updates
- CITATIONS! with hyperlinks when searching the Vector DB and getting a response.
- Display of a chat model's max context and how many tokens you've used.
2X Speed Increase
Choose "half" in the database creation settings. It will automatically choose bfloat16
or float16
based on your GPU.
This results in a 2x speed increase with extremely low loss in quality.
Chat Models
Removed Internlm2_5 - 1.8b
and Qwen 1.5 - 1.6b
as under performing.
Removed Dolphin-Llama 3 - 8b
and Internlm2 - 20b
as superseded.
Added Danube 3 - 4b
with 8k context.
Added Phi 3.5 Mini - 4b
with 8k context.
Added Hermes-4-Llama 3.1 - 8b
with 8k context
Added Internlm2_5 - 20b
with 8k context
The following models now have have 8192 context:
Model Name | Parameters (billion) | Context Length |
---|---|---|
Danube 3 - 4b | 4 | 8192 |
Dolphin-Qwen 2 - 1.5b | 1.5 | 8192 |
Phi 3.5 Mini - 4b | 4 | 8192 |
Internlm2_5 - 7b | 7 | 8192 |
Dolphin-Llama 3.1 - 8b | 8 | 8192 |
Hermes-3-Llama-3.1 - 8b | 8 | 8192 |
Dolphin-Qwen 2 - 7b | 7 | 8192 |
Dolphin-Mistral-Nemo - 12b | 12 | 8192 |
Internlm2_5 - 20b | 20 | 8192 |
Text to Speech Models
- Excited to add additional models to choose from when using
whisperspeech
as the text to speech backend - see the chart below for the variouss2a
andt2s
model combinations and "relative" compute times along with real vram usage stats.
Current Chat and Vision Models
v6.6.0 - 8192 CONTEXT!
General Updates
- Ensured that vector model pulldown menu auto-updates.
- Made the vector model pulldown menu more descriptive.
Local Models
- Added
Internlm v 2.5 1.8b
. In the last release, version 2.0 ofInternlm's
1.8b model was removed. However, the quality increased noticeably with their version 2.5 so I'm re-adding it.
Vector Models
- Excited to add
Alibaba-NLP/gte-base-en-v1.5
andAlibaba-NLP/gte-large-en-v1.5
. These vector models have a context limit of 8192, which is automatically set within the program. With a conservative estimate of 3 characters per token, that means that you can set the chunk size to approximatly24,576
!! - Removed
Stella
as it was under-performing and too difficult to work with. There is no love loss since the prior release marked it as "experimental" anyways.
Current Chat and Vision Models
v6.5.0 - Llama 3.1 & MiniCPM v2
General updates
- Remove
triton
dependency ascogvlm
vision model is also removed. - Redid all benchmarks with more-accurate parameters.
Local Models
Overall, the large amount of chat models was becoming unnecessary or redundant. Therefore, I removed models that weren't providing optimal responses to simplify the user's experience, and added Llama 3.1
.
Removed Models
Qwen 2 - 0.5b
Qwen 1.5 - 0.5b
Qwen 2 - 1.5b
Qwen 2 - 7b
- Redundant with
Dolphin Qwen 2 - 7b
- Redundant with
Yi 1.5 - 6b
Stablelm2 - 12b
Llama 3 - 8b
- Redundant with
Dolphin Llama 3 - 8b
- Redundant with
Added Models
Dolphin Llama 3.1 - 8b
Vision Models
Overall, two vision models were removed as unnecessary and MiniCPM-V-2_6 - 8b
was added. As of the date of this release, MiniCPM-V-2_6 - 8b
is now the best model in terms of quality. I currently recommend using this model if you have the time and VRAM.
Removed Models
cogvlm
MiniCPM-Llama3
Vector Models
- Added
Stella_en_1.5B_v5
, which ranks very high on the leaderboard.- Note, this is a work in progress as currently the results seem to be sub-optimal.
Current Chat and Vision Models
v6.4 - stream responses
Improvements
- All "local models" now stream their responses for a better user experience.
- Various small improvements.
Local Models
- Fixed
Dolphin Phi3-Medium
- Added
Yi 1.5 - 6b
- Added
H2O Danube3 - 4b
- Great quality small model.
- Removed
Mistral v.03 - 7b
- The model is gated so it's difficult to implement in a program. Plus, there are a plethora of other good models.
- Removed
Llama 3.1 - 8b
- Same as with Mistral.
- Added
Internlm 2.5 - 7b
- Fixed
Dolphin-Mistral-Nemo
Vision Models
- Added
Falcon-vlm - 11b
- Great quality. Uses Llava 1.6's processor.
Falcon-vlm
, Llava 1.6 Vicuna - 7b
, and Llava 1.6 Vicuna - 13b
have arguably surpassed Cogvlm
and are faster for less VRAM. Thus, Cogvlm
may be deprecated in the future.
Misc.
- Most, but not all, models should now download to the
Models
folder so you can take your folder with you. FYI, ensuring that all models do so is a work in progress, the goal being to carry all of the necessary files + program on a flash drive.
Current Chat and Vision Models
6.3.0 - whisper upgrade
NOTE
This release has been deleted a few times because of errors but this one should work now....
Updates:
- Added the large-v3 whisper model and removed large-v2.
- Added all three distil whisper model sizes.
- Ensured that all whisper model files are downloaded to the
Models/whisper
folder in the source code folder. - Added error handling in metrics bar for if/when the numbers go over 100% - e.g. a model overflows the vram.
- Modified
gui.py
to specify the multiprocess type earlier in the script to avoid some errors.
v6.2.3 - FAST installation
Uses the impressive uv
library written in RUST for a 2x-4x speed up of setup_windows.py
.
Make sure that run pip install uv
first, as outlined in the updated installation instructions.
v6.2.2 - Welcome LLAVA_NEXT
New Vector Models
Reintroducing these after an unduly long hiatus:
New Vision Models
Welcome llava-next
also known as Llava 1.6
:
Other Changes
- Removed
sentence-t5-xxl
vector model. - Set batch sizes for all current vector models.
- Fixed a bug where chat model didn't automatically eject when the program's window was closed, thus preventing the command prompt from being returned to a user.
v6.2.1 - PERFECT install patch
Patch release to add dependencies accidentally missing from setup_windows.py
. See the release notes for version 6.2.0 for more details on the release itself.
v6.2.0 - PERFECT installation
- Note, use the
setup_windows.py
script attached to this release instead or check out release 6.2.1
Breaking Changes
- Overhauled the installation procedure. Too many dependencies were creating conflicts that neither
pip
,pip-compile
or any other approach I'm aware of could solve. Thus,setup_windows.py
has been completely revamped to install onWindows
+Nvidia GPU
systems. - It should now install every library needed EVERY SINGLE TIME without exceptions. The tradeoff is that it's slightly slower, which is no biggie.
- If I have time, I will re-incorporate an installation procedure for CPU-only systems.
New Chat Models
Other Changes
- Clean up Unneeded Portions of Scripts
- Disable PHI3 Mini Models Temporarily due to errors.
- Update the System Message for the Chat Models
- Upgraded to
transformers==4.43.1
and downgraded tocuda==12.1
(to avoid errors).
Currently Support Chat Models:
v6.1.0 - complexity growing!
Version 6.1
Stability-geared release.
Bug Fixes
- VectorDB's can now be created with images again and searched.
Sentence-transformers
was the mail culprit. - Solved the issue of the DB not being created by using
from_texts
instead offrom_documents
within theTileDB
library. - Massive improvement in stability when switching to/from "local models." Involved heavy troubleshooting
multiprocessing.
- Greatly improved the installation procedure - i.e.
setup_windows.py
andrequirements.txt
, which was responsible for a lot of conflicting dependencies and therefore random errors.
Regressions
- Temporarily commented out Phi3 (original) models to solve an inference issue, but
dolphin phi3
works fine.