- This project was made by me to refine the LLaMA-2 model based on instructions, applying some techniques to save memory when training such as QLoRA, DDP, half-precision.
- You can run it on Kaggle notebook or Colab notebook.
The dataset I used is Bactrian-X, which includes 54 languages. However, I only implemented it within the scope of Vietnamese.
I use model LLaMA-2 7B to experiment. If your device has a larger configuration, you can experiment with larger versions of LLaMA-2 such as LLaMA-2 13B and LLaMA-2 70B.
First, !git clone
this repo, then install the environment with the command !pip install --upgrade -r requirements.txt
.
- Train:
- You can use the script in my notebook to train from scratch. This is the path to the checkpoint after I trained the model for more than 1 epoch: checkpoint-1
- Note: In the file run.py there are some arguments, you can change them optionally. If after stopping training, you feel that the performance is not as expected, if you want to continue training, pass the adapter model path to the
model_weight_path
argument and the state checkpoint path to thestate_checkpoint
argument in the script.
- Inference:
Inference template:from inference import Inference infer = Inference(model_checkpoint = "{your_llama2-version}", model_weight_path = "{your_model_adapter_weight_path}") instruction = "{your_instruction}" input = "{your_input} or None" print(infer(instruction = instruction, input = input)["response"])
Thank you a lot for the finding! 😊