Skip to content

Latest commit

 

History

History
95 lines (95 loc) · 4.57 KB

models.MD

File metadata and controls

95 lines (95 loc) · 4.57 KB

Models

  • T5
    • Paper
    • Architecture
      • Encoder-Decoder
  • GPT
  • GPT-Neo
  • GPT-J-6B
  • Megatron-11B
  • Pangu-a-13B
  • FairSeq
  • GLaM
  • LaMDA
  • JURASSIC-1
  • MT-NLG
  • ERNIE
  • Gopher
    • Paper
    • Conclusion
      • Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit.
  • Chinchilla
    • Paper
    • Conclusion
      • We find that current large language models are significantly under trained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant.
      • we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.
  • PaLM
    • Paper
    • Architecture
      • Decoder
  • PaLM 2
  • OPT
    • Paper
    • Architecture
      • Decoder
  • Gpt-neox
  • BLOOM
    • Paper
    • Architecture
      • Decoder
  • LLaMA
  • GLM
    • Paper
      • 2022-ACL-GLM- General Language Model Pretraining with Autoregressive Blank Infilling paper
      • 2023-ICLR-GLM-130B- An Open Bilingual Pre-trained Model paper
        • GitHub
        • Architecture
          • Autoregressive Blank Infilling
  • BloombergGPT
  • MOSS
  • OpenLLaMA: An Open Reproduction of LLaMA
  • dolly
  • panda
  • WeLM
  • Baichuan
  • Llama 2
  • Qwen
  • Chameleon
  • Mixtral