This repo is the paddlepaddle implementation of meta's LLaMA .
3/3: add cpu example.
3/4: add gpu example.
3/5: add a simplychatbot demo with ipywidgets: chatbot.
git clone https://github.com/jiaohuix/ppllama.git
cd ppllama && pip install -r requirements.txt
pip install -e ./
In order to download the checkpoints , fill this google form (tokenizer already in ckpt)
# download ckpt
bash scripts/download.sh <MODEL_SIZE>(7B/13B/30B/65B) <TARGET_FOLDER> <PRESIGNED_URL>
The following is the checkpoints directory:
ckpt
├── 13B
│ ├── checklist.chk
│ ├── consolidated.00.pth
│ ├── consolidated.01.pth
│ └── params.json
├── 7B
│ ├── checklist.chk
│ ├── consolidated.00.pth
│ ├── model0.pdparams
│ └── params.json
├── tokenizer_checklist.chk
└── tokenizer.model
This repository contains scripts for converting checkpoints from torch to paddle. I use a 2-layer model for inference to ensure that ppllama and llama are aligned, see: align
Environment configuration and speed:
Device | Memory | Load speed | Inference speed |
---|---|---|---|
cpu | 32G | 6min | 20min (1prompt) |
cuda | 32G | - | 15sec (4prompt) |
cpu:
python -m paddle.distributed.launch scripts/example_cpu.py --prompt "The capital of Germany is the city of" --mp 1 --ckpt_dir ckpt/7B/ --tokenizer_path ckpt/tokenizer.model
gpu:
python -m paddle.distributed.launch scripts/example.py --mp 1 --ckpt_dir ckpt/7B/ --tokenizer_path ckpt/tokenizer.model
If you like the project, please show your support by leaving a star ⭐.
See the LICENSE file.