LocalLLaMA tracker

Tracking Reddit sub r/LocalLLaMA Wiki coverage of AI Module, Tracker on sub for Module Release on sub (Manual Update) Base on this wiki thread https://www.reddit.com/r/LocalLLaMA/wiki/models/ by u/Civil_Collection7267 Last Update 2 June 2023

Disclaimer

Use at your own risk, i not endorse or support the list of module show here, use it with responsible, just like knife it double edge sword.

Specification

8 Bit Specification for LLMA

Model	VRAM Used	Minimum Total VRAM	Card examples	RAM/Swap to Load*
LLaMA-7B	9.2GB	10GB	3060 12GB, 3080 10GB	24 GB
LLaMA-13B	16.3GB	20GB	3090, 3090 Ti, 4090	32 GB
LLaMA-30B	36GB	40GB	A6000 48GB, A100 40GB	64 GB
LLaMA-65B	74GB	80GB	A100 80GB	128 GB

System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.

4 Bit Specification for LLMA

Model	Minimum Total VRAM	Card examples	RAM/Swap to Load*
LLaMA-7B	6GB	GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060	6 GB
LLaMA-13B	10GB	AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000	12 GB
LLaMA-30B	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	32 GB
LLaMA-65B	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000	64 GB

System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.

Current Best Module

Current LLMA base on

Best choice means for most tasks. There are other options for different niches. For a model like Vicuna but with less restrictions, use GPT4 x Vicuna. For RP chatting, use base LLaMA 30B or 65B without LoRA and with a character card.

For writing stories, use the current best choice below if you want the least amount of effort for decent results. If you want highly detailed and personalized stories and don't mind spending a lot of time on prompting, use base LLaMA 30B or 65B without LoRA.

Hugging Face

7B: Vicuna 7B v1.1

13B: Vicuna 13B v1.1

30B: Guanaco

65B: Guanaco 65B

7B 4-bit GPTQ: Vicuna 7B v1.1 4-bit

13B 4-bit GPTQ: Vicuna 13B v1.1 4-bit

30B 4-bit GPTQ: GPT4 Alpaca LoRA 30B Merge*

65B 4-bit GPTQ: Guanaco 65B 4-bit

llama.cpp

7B: Vicuna v1.1

13B: Vicuna v1.1

30B: GPT4 Alpaca LoRA Merge*

65B: Guanaco 65B

*Use OASST LLaMA 30B below for the closest ChatGPT clone.

All Downloads

r/LocalLLaMA does not endorse, claim responsibility for, or associate with any models, groups, or individuals listed here. If you would like your link added or removed from this list, please send a message to modmail.

This list is not comprehensive but should include most relevant links. If you plan on copying this list to use elsewhere but won't be updating it yourself, feel free to link back to this wiki page as this will be kept updated with the latest downloads.

Some links may have multiple formats. Always use .safetensors when available.

Models

Base 7B-65B 4-bit without groupsize can be downloaded here.

Base 7B-65B 4-bit with groupsize can be downloaded here.

Due to the increasing amount of models available, parts of this section have been split into charts for easier comparison. The models listed directly below have been tested for their quality.

Models listed in the Extra section are not worse than the models in the chart but are generally unique in some way. For example, MedAlpaca was made for medical domain tasks, LLaVA for visual instruction, etc.

Sorted approximately from best to worst, subjective comparison by category:

7B

Models restricted

Models unrestricted

Vicuna 7B v1.1

WizardLM Uncensored2*

WizardLM Uncensored 4-bit

Baize V2 7B 4-bit

Alpaca Native 4-bit

LLaMA

Extra: WizardVicunaLM Uncensored, LLaMA Deus V3 Merge, Pygmalion 7B, Pygmalion 7B 4-bit, Metharme 7B, Metharme 7B 4-bit, PubMed LLaMA 7B, MedAlpaca 7B, Alpaca Native Enhanced

1* This has lighter restrictions than Vicuna and was previously listed in the unrestricted section, but it may trend toward shorter generations than Vicuna.

2* This is better than AlpacaGPT4 in most areas, especially assistant tasks, but is generally worse for long creative generations.

3* This may be prone to light restrictions that do not necessarily impact the model's quality. The coherency of the model can initially seem dubious, but it works best when given a good prompt to start with. This 7B model is ideal for storywriting and should be adept at longer generations compared to others in the list.

13B

Models restricted

Models unrestricted

Modelsother

Vicuna 13B v1.1

GPT4 x Vicuna2*

WizardLM 13B 1.04*

Vicuna 13B v1.1 4-bit

GPT4 x Vicuna 4-bit2*

WizardLM 13B 1.0 4-bit4*

GPT4 x Alpaca 4-bit3*

WizardVicunaLM 4-bit5*

Baize V2 13B (4-bit)

Alpaca Native

WizardVicunaLM Uncensored5*

OASST LLaMA (4-bit)

LLaMA

WizardVicunaLM Uncensored 4-bit5*

Notable Mention: LLaMA with AlpacaGPT4 LoRA 13B for longer creative generations.

Extra: Manticore 13B (4-bit), GPT4All 13B snoozy (4-bit), Chronos 13B (4-bit), Pygmalion 13B (4-bit), Metharme 13B (4-bit), WizardLM 13B Uncensored (4-bit), Vicuna Evol-Instruct, LLaVA Delta, MedAlpaca 13B, GPT4 x Alpaca Roleplay Merge (4-bit V2), pretrained-sft-do2 (4-bit), Toolpaca, Vicuna 13B v0 (4-bit), WizardLM 13B 1.0 diff weights

1* StableVicuna has almost universally higher benchmarks than regular Vicuna, but it fails challenge questions that even Vicuna 7B can answer. It is also based on Vicuna v0. For real usage, its quality seems about on par or slightly worse than Vicuna v1.1.

2* Not completely unrestricted, and this model fails several logic tests that GPT4 x Alpaca passes. However, it may be better than GPT4 x Alpaca for creative tasks. While its restrictions are almost negligible, it inherits some of Vicuna's inherent limitations. Without proper prompting, this may result in generations with similar plot progressions and endings like ChatGPT, e.g. "they lived happily ever after"

3* The original top choice for weeks and a model that can still be used today for various creative uses. GPT4 x Alpaca naturally produces flowery language that some may consider ideal for storytelling. However, this model may be considered the worst for following complex instructions.

4* This is an official release from the WizardLM team trained with the full dataset of 250K evolved instructions. It adopts the prompt format from Vicuna v1.1, and this model should be used over the older, experimental WizardVicunaLM.

5* This is an experimental model designed for proof of concept. It is a combination of WizardLM's dataset, ChatGPT's conversation extension, and Vicuna's tuning method.

30B

Models restricted

Models unrestricted

Guanaco (4-bit)

GPT4 Alpaca LoRA Merge2*

OASST RLHF 2 LLaMA (4-bit)1*

Alpaca LoRA 30B Merge

OASST SFT 7 LLaMA 4-bit1*

LLaMA

Extra: WizardLM 30B Uncensored, WizardLM 30B Uncensored 4-bit, OASST SFT 6 LLaMA 4-bitcommit 1c2afcb, OASST RLHF 2 LLaMA XOR, OASST SFT 7 LLaMA XOR

1* This is a finalized version of OASST LLaMA from Open Assistant.

2* This may be more prone to hallucinatory issues than the original Alpaca LoRA Merge.

65B

Guanaco 65B 4-bit

LLaMA

Extra: LLaMA-Adapter V2 Chat

LoRA

Sorted alphabetically:

AlpacaGPT4 13B Elina*

GPT4 x Alpaca RP (13B)

SuperCOT (7B, 13B, 30B)

Vicuna Evol-Instruct 7B

Vicuna Evol-Instruct 13B

Vicuna Evol-Instruct Starcoder (13B)

*Alpaca LoRA Elina checkpoints are trained with longer cutoff lengths than their original counterparts. AlpacaGPT4 Elina supersedes GPT4 Alpaca.

**GPT4 Alpaca and GPT4 x Alpaca are not the same. GPT4 Alpaca uses the GPT-4 dataset from Microsoft Research.

Other Languages

Sorted alphabetically:

Chinese Alpaca LoRA (GitHub): 7B, 13B

Chinese ChatFlow (GitHub): 7B, 13B

Chinese LLaMA Extended (GitHub): 7B, 13B

Chinese LLaMA LoRA (GitHub): 7B, 13B

Chinese Vicuna LoRA (GitHub): 7B, 13B

French LoRA (GitHub): 7B, 13B, 30B

Italian LoRA (GitHub): 7B, 13B

Japanese LoRA (GitHub): 7B, 13B, 30B, 65B

Korean LoRA (GitHub): 13B, 30B, 65B

Portuguese LoRA (GitHub)

Russian LoRA 7B Merge

Russian LoRA 13B

Spanish LoRA 7B

llama.cpp

Models that aren't worth including are not listed here.

Update: The quantization format has been updated. All ggml model files using the old format will not work with the latest llama.cpp code. If you want to use models with the old format, commit cf348a6 is before the breaking change. This list may include a few models in the old format.

Sorted alphabetically:

7B

Extra or old format: MedAlpaca, Vicuna v0

13B

WizardLM 13B Uncensored

WizardLM 13B 1.0*

WizardVicunaLM**

WizardVicunaLM Uncensored**

Extra or old format: Vicuna v0, OASST LLaMA, pretrained-sft-do2, Alpaca Native, Toolpaca

*This is an official release from the WizardLM team trained with the full dataset of 250K evolved instructions. It adopts the prompt format from Vicuna v1.1, and this model should be used over the older, experimental WizardVicunaLM.

**This is an experimental model designed for proof of concept. It is a combination of WizardLM's dataset, ChatGPT's conversation extension, and Vicuna's tuning method.

30B

GPT4 Alpaca LoRA Merge

Guanaco

OASST SFT 7 LLaMA

SuperCOT

WizardLM 30B Uncensored

WizardVicunaLM 30B Uncensored

Extra or old format: Alpaca LoRA Merge

65B

Guanaco

VicUnlocked Alpaca 65B

Prompt Templates

For optimal results, you need to use the correct prompt template for the model you're using. This section lists the main prompt templates and some examples of what uses it. This list is not comprehensive.

Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
*your text here*

### Response:

Applies to: Alpaca LoRA, Alpaca Native, GPT4 Alpaca LoRA, GPT4 x Alpaca

Alpaca with Input

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
*your text here*

### Input:
*your text here*

### Response:

Applies to: Alpaca LoRA, Alpaca Native, GPT4 Alpaca LoRA, GPT4 x Alpaca

OpenAssistant LLaMA:

<|prompter|>*your text here*<|endoftext|><|assistant|>

Applies to: OASST LLaMA 13B, OASST SFT 7 LLaMA, OASST RLHF 2 LLaMA, pretrained-sft-do2

Vicuna v0

A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.

### Human: *your text here*
### Assistant:

Applies to: StableVicuna v0, Vicuna v0

Vicuna v1.1

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: *your text here*
ASSISTANT:

Applies to: StableVicuna v2, Vicuna Evol-Instruct, Vicuna v1.1, WizardVicunaLM and derivatives

Other Templates

GPT4 x Vicuna:

### Instruction:
*your text here*

### Response:

or

### Instruction:
*your text here*

### Input:
*your text here*

### Response:

Guanaco QLoRA*

### Human: *your text here*

### Assistant:

*This should not be confused with the older Guanaco model made by a separate group and using a different dataset.

Metharme and Pygmalion

Metharme explanation

Pygmalion explanation

WizardLM 7B

*your text here*

### Response:

WizardLM 13B 1.0*

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: *your text here* ASSISTANT:

*This should not be confused with the older WizardLM models that use the dataset of 70K evolved instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalLLaMA tracker

Disclaimer

Specification

Current Best Module

Current LLMA base on

Hugging Face

llama.cpp

All Downloads

Models

LoRA

Other Languages

llama.cpp

Prompt Templates

Other Templates

About

Releases

Packages

tukangcode/LocalLLaMA-tracker

Folders and files

Latest commit

History

Repository files navigation

LocalLLaMA tracker

Disclaimer

Specification

Current Best Module

Current LLMA base on

Hugging Face

llama.cpp

All Downloads

Models

LoRA

Other Languages

llama.cpp

Prompt Templates

Other Templates

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages