Skip to content

tukangcode/LocalLLaMA-tracker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

LocalLLaMA tracker

Tracking Reddit sub r/LocalLLaMA Wiki coverage of AI Module, Tracker on sub for Module Release on sub (Manual Update) Base on this wiki thread https://www.reddit.com/r/LocalLLaMA/wiki/models/ by u/Civil_Collection7267 Last Update 2 June 2023

Disclaimer

Use at your own risk, i not endorse or support the list of module show here, use it with responsible, just like knife it double edge sword.

Specification

8 Bit Specification for LLMA

Model VRAM Used Minimum Total VRAM Card examples RAM/Swap to Load*
LLaMA-7B 9.2GB 10GB 3060 12GB, 3080 10GB 24 GB
LLaMA-13B 16.3GB 20GB 3090, 3090 Ti, 4090 32 GB
LLaMA-30B 36GB 40GB A6000 48GB, A100 40GB 64 GB
LLaMA-65B 74GB 80GB A100 80GB 128 GB
  • System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.

4 Bit Specification for LLMA

Model Minimum Total VRAM Card examples RAM/Swap to Load*
LLaMA-7B 6GB GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 6 GB
LLaMA-13B 10GB AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 12 GB
LLaMA-30B 20GB RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 32 GB
LLaMA-65B 40GB A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 64 GB
  • System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.

Current Best Module

Current LLMA base on

Best choice means for most tasks. There are other options for different niches. For a model like Vicuna but with less restrictions, use GPT4 x Vicuna. For RP chatting, use base LLaMA 30B or 65B without LoRA and with a character card.

For writing stories, use the current best choice below if you want the least amount of effort for decent results. If you want highly detailed and personalized stories and don't mind spending a lot of time on prompting, use base LLaMA 30B or 65B without LoRA.

Hugging Face

7B: Vicuna 7B v1.1

13B: Vicuna 13B v1.1

30B: Guanaco

65B: Guanaco 65B

7B 4-bit GPTQ: Vicuna 7B v1.1 4-bit

13B 4-bit GPTQ: Vicuna 13B v1.1 4-bit

30B 4-bit GPTQ: GPT4 Alpaca LoRA 30B Merge*

65B 4-bit GPTQ: Guanaco 65B 4-bit

llama.cpp

7B: Vicuna v1.1

13B: Vicuna v1.1

30B: GPT4 Alpaca LoRA Merge*

65B: Guanaco 65B

*Use OASST LLaMA 30B below for the closest ChatGPT clone.


All Downloads

r/LocalLLaMA does not endorse, claim responsibility for, or associate with any models, groups, or individuals listed here. If you would like your link added or removed from this list, please send a message to modmail.

This list is not comprehensive but should include most relevant links. If you plan on copying this list to use elsewhere but won't be updating it yourself, feel free to link back to this wiki page as this will be kept updated with the latest downloads.

Some links may have multiple formats. Always use .safetensors when available.

Models

Base 7B-65B 4-bit without groupsize can be downloaded here.

Base 7B-65B 4-bit with groupsize can be downloaded here.

Due to the increasing amount of models available, parts of this section have been split into charts for easier comparison. The models listed directly below have been tested for their quality.

Models listed in the Extra section are not worse than the models in the chart but are generally unique in some way. For example, MedAlpaca was made for medical domain tasks, LLaVA for visual instruction, etc.

Sorted approximately from best to worst, subjective comparison by category:

7B

Models restricted

Models unrestricted

Vicuna 7B v1.1

WizardLM Uncensored2*

Baize V2 7B1*

AlpacaGPT4 7B3*

Vicuna Evol-Instruct

Alpaca Native

Vicuna 7B v1.1 4-bit

WizardLM Uncensored 4-bit

Baize V2 7B 4-bit

Alpaca Native 4-bit

LLaMA

Extra: WizardVicunaLM Uncensored, LLaMA Deus V3 Merge, Pygmalion 7B, Pygmalion 7B 4-bit, Metharme 7B, Metharme 7B 4-bit, PubMed LLaMA 7B, MedAlpaca 7B, Alpaca Native Enhanced

1* This has lighter restrictions than Vicuna and was previously listed in the unrestricted section, but it may trend toward shorter generations than Vicuna.

2* This is better than AlpacaGPT4 in most areas, especially assistant tasks, but is generally worse for long creative generations.

3* This may be prone to light restrictions that do not necessarily impact the model's quality. The coherency of the model can initially seem dubious, but it works best when given a good prompt to start with. This 7B model is ideal for storywriting and should be adept at longer generations compared to others in the list.

13B

Models restricted

Models unrestricted

Modelsother

Vicuna 13B v1.1

GPT4 x Vicuna2*

WizardLM 13B 1.04*

Vicuna 13B v1.1 4-bit

GPT4 x Vicuna 4-bit2*

WizardLM 13B 1.0 4-bit4*

StableVicuna1*

GPT4 x Alpaca3*

WizardVicunaLM5*

StableVicuna 4-bit1*

GPT4 x Alpaca 4-bit3*

WizardVicunaLM 4-bit5*

Baize V2 13B (4-bit)

Alpaca Native

WizardVicunaLM Uncensored5*

OASST LLaMA (4-bit)

LLaMA

WizardVicunaLM Uncensored 4-bit5*

Notable Mention: LLaMA with AlpacaGPT4 LoRA 13B for longer creative generations.

Extra: Manticore 13B (4-bit), GPT4All 13B snoozy (4-bit), Chronos 13B (4-bit), Pygmalion 13B (4-bit), Metharme 13B (4-bit), WizardLM 13B Uncensored (4-bit), Vicuna Evol-Instruct, LLaVA Delta, MedAlpaca 13B, GPT4 x Alpaca Roleplay Merge (4-bit V2), pretrained-sft-do2 (4-bit), Toolpaca, Vicuna 13B v0 (4-bit), WizardLM 13B 1.0 diff weights

1* StableVicuna has almost universally higher benchmarks than regular Vicuna, but it fails challenge questions that even Vicuna 7B can answer. It is also based on Vicuna v0. For real usage, its quality seems about on par or slightly worse than Vicuna v1.1.

2* Not completely unrestricted, and this model fails several logic tests that GPT4 x Alpaca passes. However, it may be better than GPT4 x Alpaca for creative tasks. While its restrictions are almost negligible, it inherits some of Vicuna's inherent limitations. Without proper prompting, this may result in generations with similar plot progressions and endings like ChatGPT, e.g. "they lived happily ever after"

3* The original top choice for weeks and a model that can still be used today for various creative uses. GPT4 x Alpaca naturally produces flowery language that some may consider ideal for storytelling. However, this model may be considered the worst for following complex instructions.

4* This is an official release from the WizardLM team trained with the full dataset of 250K evolved instructions. It adopts the prompt format from Vicuna v1.1, and this model should be used over the older, experimental WizardVicunaLM.

5* This is an experimental model designed for proof of concept. It is a combination of WizardLM's dataset, ChatGPT's conversation extension, and Vicuna's tuning method.

30B

Models restricted

Models unrestricted

Guanaco (4-bit)

GPT4 Alpaca LoRA Merge2*

OASST RLHF 2 LLaMA (4-bit)1*

Alpaca LoRA 30B Merge

OASST SFT 7 LLaMA 4-bit1*

LLaMA

Extra: WizardLM 30B Uncensored, WizardLM 30B Uncensored 4-bit, OASST SFT 6 LLaMA 4-bitcommit 1c2afcb, OASST RLHF 2 LLaMA XOR, OASST SFT 7 LLaMA XOR

1* This is a finalized version of OASST LLaMA from Open Assistant.

2* This may be more prone to hallucinatory issues than the original Alpaca LoRA Merge.

65B

Guanaco 65B 4-bit

LLaMA

Extra: LLaMA-Adapter V2 Chat

LoRA

Sorted alphabetically:

Alpaca 7B

Alpaca 13B

Alpaca 30B

Alpaca 65B

Alpaca 7B Elina*

Alpaca 13B Elina*

Alpaca 30B Elina*

Alpaca 65B Elina*

AlpacaGPT4 7B Elina*

AlpacaGPT4 13B Elina*

Baize 7B

Baize 7B Healthcare

Baize 13B

Baize 30B

gpt4all (7B)

GPT4 Alpaca 7B**

GPT4 Alpaca 13B**

GPT4 Alpaca 30B**

GPT4 Alpaca 65B**

GPT4 x Alpaca RP (13B)

LLaMA Deus V3 (7B)

MedAlpaca 7B

MedAlpaca 13B

MedAlpaca 30B

StackLLaMA 7B

SuperCOT (7B, 13B, 30B)

Vicuna Evol-Instruct 7B

Vicuna Evol-Instruct 13B

Vicuna Evol-Instruct Starcoder (13B)

*Alpaca LoRA Elina checkpoints are trained with longer cutoff lengths than their original counterparts. AlpacaGPT4 Elina supersedes GPT4 Alpaca.

**GPT4 Alpaca and GPT4 x Alpaca are not the same. GPT4 Alpaca uses the GPT-4 dataset from Microsoft Research.

Other Languages

Sorted alphabetically:

Chinese Alpaca LoRA (GitHub): 7B, 13B

Chinese ChatFlow (GitHub): 7B, 13B

Chinese LLaMA Extended (GitHub): 7B, 13B

Chinese LLaMA LoRA (GitHub): 7B, 13B

Chinese Vicuna LoRA (GitHub): 7B, 13B

French LoRA (GitHub): 7B, 13B, 30B

Italian LoRA (GitHub): 7B, 13B

Japanese LoRA (GitHub): 7B, 13B, 30B, 65B

Korean LoRA (GitHub): 13B, 30B, 65B

Portuguese LoRA (GitHub)

Russian LoRA 7B Merge

Russian LoRA 13B

Spanish LoRA 7B

llama.cpp

Models that aren't worth including are not listed here.

Update: The quantization format has been updated. All ggml model files using the old format will not work with the latest llama.cpp code. If you want to use models with the old format, commit cf348a6 is before the breaking change. This list may include a few models in the old format.

Sorted alphabetically:

7B

Alpaca Native

Baize V2 7B

Metharme 7B

Pygmalion 7B

Vicuna v1.1

WizardLM Uncensored

Extra or old format: MedAlpaca, Vicuna v0

13B

Baize V2 13B

GPT4All 13B snoozy

GPT4 x Alpaca

GPT4 x Vicuna

Metharme 13B

Pygmalion 13B

StableVicuna

Vicuna v1.1

WizardLM 13B Uncensored

WizardLM 13B 1.0*

WizardVicunaLM**

WizardVicunaLM Uncensored**

Extra or old format: Vicuna v0, OASST LLaMA, pretrained-sft-do2, Alpaca Native, Toolpaca

*This is an official release from the WizardLM team trained with the full dataset of 250K evolved instructions. It adopts the prompt format from Vicuna v1.1, and this model should be used over the older, experimental WizardVicunaLM.

**This is an experimental model designed for proof of concept. It is a combination of WizardLM's dataset, ChatGPT's conversation extension, and Vicuna's tuning method.

30B

GPT4 Alpaca LoRA Merge

Guanaco

OASST SFT 7 LLaMA

SuperCOT

WizardLM 30B Uncensored

WizardVicunaLM 30B Uncensored

Extra or old format: Alpaca LoRA Merge

65B

Guanaco

VicUnlocked Alpaca 65B


Prompt Templates

For optimal results, you need to use the correct prompt template for the model you're using. This section lists the main prompt templates and some examples of what uses it. This list is not comprehensive.

Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
*your text here*

### Response:

Applies to: Alpaca LoRA, Alpaca Native, GPT4 Alpaca LoRA, GPT4 x Alpaca

Alpaca with Input

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
*your text here*

### Input:
*your text here*

### Response:

Applies to: Alpaca LoRA, Alpaca Native, GPT4 Alpaca LoRA, GPT4 x Alpaca

OpenAssistant LLaMA:

<|prompter|>*your text here*<|endoftext|><|assistant|>

Applies to: OASST LLaMA 13B, OASST SFT 7 LLaMA, OASST RLHF 2 LLaMA, pretrained-sft-do2

Vicuna v0

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.

### Human: *your text here*
### Assistant:

Applies to: StableVicuna v0, Vicuna v0

Vicuna v1.1

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: *your text here*
ASSISTANT:

Applies to: StableVicuna v2, Vicuna Evol-Instruct, Vicuna v1.1, WizardVicunaLM and derivatives

Other Templates

GPT4 x Vicuna:

### Instruction:
*your text here*

### Response:

or

### Instruction:
*your text here*

### Input:
*your text here*

### Response:

Guanaco QLoRA*

### Human: *your text here*

### Assistant:

*This should not be confused with the older Guanaco model made by a separate group and using a different dataset.

Metharme and Pygmalion

Metharme explanation

Pygmalion explanation

WizardLM 7B

*your text here*

### Response:

WizardLM 13B 1.0*

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: *your text here* ASSISTANT:

*This should not be confused with the older WizardLM models that use the dataset of 70K evolved instructions.

Releases

No releases published

Packages

No packages published