Models

T5
- Paper
- Architecture
  - Encoder-Decoder
GPT
- Paper
  - GPT
  - GPT-2
  - GPT-3
GPT-Neo
GPT-J-6B
Megatron-11B
Pangu-a-13B
FairSeq
GLaM
- Paper
LaMDA
- Paper
JURASSIC-1
- Paper
MT-NLG
- Paper
ERNIE
- Paper
Gopher
- Paper
- Conclusion
  - Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit.
Chinchilla
- Paper
- Conclusion
  - We find that current large language models are significantly under trained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant.
  - we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.
PaLM
- Paper
- Architecture
  - Decoder
PaLM 2
- Blog
- PaLM 2 Technical Report
OPT
- Paper
- Architecture
  - Decoder
Gpt-neox
- Paper
- GitHub
- Architecture
  - Decoder
BLOOM
- Paper
- Architecture
  - Decoder
LLaMA
- Paper
- Model
- Architecture
  - Decoder
GLM
- Paper
  - 2022-ACL-GLM- General Language Model Pretraining with Autoregressive Blank Infilling paper
    - GitHub
  - 2023-ICLR-GLM-130B- An Open Bilingual Pre-trained Model paper
    - GitHub
    - Architecture
      - Autoregressive Blank Infilling
BloombergGPT
- Paper
MOSS
- GitHub
OpenLLaMA: An Open Reproduction of LLaMA
- GitHub
dolly
- GitHub
panda
- GitHub
- Paper
WeLM
- Paper
Baichuan
- Baichuan-7B
- Baichuan-13B
Llama 2
- site
- paper
Qwen
- Technical Report
Chameleon
- paper
Mixtral
- paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models.MD

models.MD

Models

Files

models.MD

Latest commit

History

models.MD

File metadata and controls

Models