Tracking Reddit sub r/LocalLLaMA Wiki coverage of AI Module, Tracker on sub for Module Release on sub (Manual Update) Base on this wiki thread https://www.reddit.com/r/LocalLLaMA/wiki/models/ by u/Civil_Collection7267 Last Update 2 June 2023
Use at your own risk, i not endorse or support the list of module show here, use it with responsible, just like knife it double edge sword.
8 Bit Specification for LLMA
Model | VRAM Used | Minimum Total VRAM | Card examples | RAM/Swap to Load* |
---|---|---|---|---|
LLaMA-7B | 9.2GB | 10GB | 3060 12GB, 3080 10GB | 24 GB |
LLaMA-13B | 16.3GB | 20GB | 3090, 3090 Ti, 4090 | 32 GB |
LLaMA-30B | 36GB | 40GB | A6000 48GB, A100 40GB | 64 GB |
LLaMA-65B | 74GB | 80GB | A100 80GB | 128 GB |
- System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.
4 Bit Specification for LLMA
Model | Minimum Total VRAM | Card examples | RAM/Swap to Load* |
---|---|---|---|
LLaMA-7B | 6GB | GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 | 6 GB |
LLaMA-13B | 10GB | AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 | 12 GB |
LLaMA-30B | 20GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 | 32 GB |
LLaMA-65B | 40GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 | 64 GB |
- System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.
Best choice means for most tasks. There are other options for different niches. For a model like Vicuna but with less restrictions, use GPT4 x Vicuna. For RP chatting, use base LLaMA 30B or 65B without LoRA and with a character card.
For writing stories, use the current best choice below if you want the least amount of effort for decent results. If you want highly detailed and personalized stories and don't mind spending a lot of time on prompting, use base LLaMA 30B or 65B without LoRA.
7B: Vicuna 7B v1.1
13B: Vicuna 13B v1.1
30B: Guanaco
65B: Guanaco 65B
7B 4-bit GPTQ: Vicuna 7B v1.1 4-bit
13B 4-bit GPTQ: Vicuna 13B v1.1 4-bit
30B 4-bit GPTQ: GPT4 Alpaca LoRA 30B Merge*
65B 4-bit GPTQ: Guanaco 65B 4-bit
7B: Vicuna v1.1
13B: Vicuna v1.1
30B: GPT4 Alpaca LoRA Merge*
65B: Guanaco 65B
*Use OASST LLaMA 30B below for the closest ChatGPT clone.
r/LocalLLaMA does not endorse, claim responsibility for, or associate with any models, groups, or individuals listed here. If you would like your link added or removed from this list, please send a message to modmail.
This list is not comprehensive but should include most relevant links. If you plan on copying this list to use elsewhere but won't be updating it yourself, feel free to link back to this wiki page as this will be kept updated with the latest downloads.
Some links may have multiple formats. Always use .safetensors when available.
Base 7B-65B 4-bit without groupsize can be downloaded here.
Base 7B-65B 4-bit with groupsize can be downloaded here.
Due to the increasing amount of models available, parts of this section have been split into charts for easier comparison. The models listed directly below have been tested for their quality.
Models listed in the Extra section are not worse than the models in the chart but are generally unique in some way. For example, MedAlpaca was made for medical domain tasks, LLaVA for visual instruction, etc.
Sorted approximately from best to worst, subjective comparison by category:
7B
Models restricted
Models unrestricted
Extra: WizardVicunaLM Uncensored, LLaMA Deus V3 Merge, Pygmalion 7B, Pygmalion 7B 4-bit, Metharme 7B, Metharme 7B 4-bit, PubMed LLaMA 7B, MedAlpaca 7B, Alpaca Native Enhanced
1* This has lighter restrictions than Vicuna and was previously listed in the unrestricted section, but it may trend toward shorter generations than Vicuna.
2* This is better than AlpacaGPT4 in most areas, especially assistant tasks, but is generally worse for long creative generations.
3* This may be prone to light restrictions that do not necessarily impact the model's quality. The coherency of the model can initially seem dubious, but it works best when given a good prompt to start with. This 7B model is ideal for storywriting and should be adept at longer generations compared to others in the list.
13B
Models restricted
Models unrestricted
Modelsother
WizardVicunaLM Uncensored 4-bit5*
Notable Mention: LLaMA with AlpacaGPT4 LoRA 13B for longer creative generations.
Extra: Manticore 13B (4-bit), GPT4All 13B snoozy (4-bit), Chronos 13B (4-bit), Pygmalion 13B (4-bit), Metharme 13B (4-bit), WizardLM 13B Uncensored (4-bit), Vicuna Evol-Instruct, LLaVA Delta, MedAlpaca 13B, GPT4 x Alpaca Roleplay Merge (4-bit V2), pretrained-sft-do2 (4-bit), Toolpaca, Vicuna 13B v0 (4-bit), WizardLM 13B 1.0 diff weights
1* StableVicuna has almost universally higher benchmarks than regular Vicuna, but it fails challenge questions that even Vicuna 7B can answer. It is also based on Vicuna v0. For real usage, its quality seems about on par or slightly worse than Vicuna v1.1.
2* Not completely unrestricted, and this model fails several logic tests that GPT4 x Alpaca passes. However, it may be better than GPT4 x Alpaca for creative tasks. While its restrictions are almost negligible, it inherits some of Vicuna's inherent limitations. Without proper prompting, this may result in generations with similar plot progressions and endings like ChatGPT, e.g. "they lived happily ever after"
3* The original top choice for weeks and a model that can still be used today for various creative uses. GPT4 x Alpaca naturally produces flowery language that some may consider ideal for storytelling. However, this model may be considered the worst for following complex instructions.
4* This is an official release from the WizardLM team trained with the full dataset of 250K evolved instructions. It adopts the prompt format from Vicuna v1.1, and this model should be used over the older, experimental WizardVicunaLM.
5* This is an experimental model designed for proof of concept. It is a combination of WizardLM's dataset, ChatGPT's conversation extension, and Vicuna's tuning method.
30B
Models restricted
Models unrestricted
Extra: WizardLM 30B Uncensored, WizardLM 30B Uncensored 4-bit, OASST SFT 6 LLaMA 4-bitcommit 1c2afcb, OASST RLHF 2 LLaMA XOR, OASST SFT 7 LLaMA XOR
1* This is a finalized version of OASST LLaMA from Open Assistant.
2* This may be more prone to hallucinatory issues than the original Alpaca LoRA Merge.
65B
Extra: LLaMA-Adapter V2 Chat
Sorted alphabetically:
Vicuna Evol-Instruct Starcoder (13B)
*Alpaca LoRA Elina checkpoints are trained with longer cutoff lengths than their original counterparts. AlpacaGPT4 Elina supersedes GPT4 Alpaca.
**GPT4 Alpaca and GPT4 x Alpaca are not the same. GPT4 Alpaca uses the GPT-4 dataset from Microsoft Research.
Sorted alphabetically:
Chinese Alpaca LoRA (GitHub): 7B, 13B
Chinese ChatFlow (GitHub): 7B, 13B
Chinese LLaMA Extended (GitHub): 7B, 13B
Chinese LLaMA LoRA (GitHub): 7B, 13B
Chinese Vicuna LoRA (GitHub): 7B, 13B
French LoRA (GitHub): 7B, 13B, 30B
Italian LoRA (GitHub): 7B, 13B
Japanese LoRA (GitHub): 7B, 13B, 30B, 65B
Korean LoRA (GitHub): 13B, 30B, 65B
Models that aren't worth including are not listed here.
Update: The quantization format has been updated. All ggml model files using the old format will not work with the latest llama.cpp code. If you want to use models with the old format, commit cf348a6 is before the breaking change. This list may include a few models in the old format.
Sorted alphabetically:
7B
Extra or old format: MedAlpaca, Vicuna v0
13B
Extra or old format: Vicuna v0, OASST LLaMA, pretrained-sft-do2, Alpaca Native, Toolpaca
*This is an official release from the WizardLM team trained with the full dataset of 250K evolved instructions. It adopts the prompt format from Vicuna v1.1, and this model should be used over the older, experimental WizardVicunaLM.
**This is an experimental model designed for proof of concept. It is a combination of WizardLM's dataset, ChatGPT's conversation extension, and Vicuna's tuning method.
30B
Extra or old format: Alpaca LoRA Merge
65B
For optimal results, you need to use the correct prompt template for the model you're using. This section lists the main prompt templates and some examples of what uses it. This list is not comprehensive.
Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
*your text here*
### Response:
Applies to: Alpaca LoRA, Alpaca Native, GPT4 Alpaca LoRA, GPT4 x Alpaca
Alpaca with Input
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
*your text here*
### Input:
*your text here*
### Response:
Applies to: Alpaca LoRA, Alpaca Native, GPT4 Alpaca LoRA, GPT4 x Alpaca
OpenAssistant LLaMA:
<|prompter|>*your text here*<|endoftext|><|assistant|>
Applies to: OASST LLaMA 13B, OASST SFT 7 LLaMA, OASST RLHF 2 LLaMA, pretrained-sft-do2
Vicuna v0
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.
### Human: *your text here*
### Assistant:
Applies to: StableVicuna v0, Vicuna v0
Vicuna v1.1
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: *your text here*
ASSISTANT:
Applies to: StableVicuna v2, Vicuna Evol-Instruct, Vicuna v1.1, WizardVicunaLM and derivatives
GPT4 x Vicuna:
### Instruction:
*your text here*
### Response:
or
### Instruction:
*your text here*
### Input:
*your text here*
### Response:
Guanaco QLoRA*
### Human: *your text here*
### Assistant:
*This should not be confused with the older Guanaco model made by a separate group and using a different dataset.
Metharme and Pygmalion
WizardLM 7B
*your text here*
### Response:
WizardLM 13B 1.0*
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: *your text here* ASSISTANT:
*This should not be confused with the older WizardLM models that use the dataset of 70K evolved instructions.