This README outlines the key changes and additions made in this repository compared to the GPTNeox
. Our aim is to maintain transparency about the updates and improvements made to the original codebase.
-
Add data processing of sft (support
alpaca (single-round) && sharegpt (multi-round)
), dataset splicing mode and fix some bugs. -
Add
reset_mask
andreset_id
to see if you can see the front when splicing, and reset the position id (It can be used inflash_attention, flash_attention_triton, global attention
, but requires more testing). -
Add
flash_attention v1 && v2
, depending on the version installed,reset_mask
is currently supported intraining mode
; whileflash_attention_triton
only supportsv1
and can supportreset_mask
, which is difficult to test and loss will be a little different, but the trend is the same. -
Llama2
andLlama1
have been merged, the main difference is that whenqkv weight
is splicing, ifGQA/MQA
is used,torch.cat(QKV)
is used, otherwisetorch.stack(QKV)
is used. It is reflected in./tools/convert_neox_llama_weights_to_hf.py
and./tools/convert_raw_llama_weights_to_neox.py
. -
Rotary Position Embedding
supportsDynamic Scaled
to use a longer length, controlling thescale factor
throughneox_args.ntk
.
- some issue: bf16 + zero stage 1 + cpu offload;
- inference: web demo + api;
- more evaluation: lm_eval + helm;
- more docs + logs.
It is currently only tested on alpaca, sharegpt
data,alpaca
is a single-round dialogue, sharegpt
is a multi-round dialogue, which can be referred to.
python prepare_data_sft.py -d ./data/sft/alpaca_gpt4 -i ./data/raw_data/alpaca_gpt4 -t SPMTokenizer -v ./vocab_file/tokenizer.model alpaca_gpt4
python prepare_data_sft.py -d ./data/sft/sharegpt -i ./data/raw_data/sharegpt -t SPMTokenizer -v ./vocab_file/tokenizer.model sharegpt
Then generate the corresponding .bin
file and .idx
file for text
and label
, respectively.
python ./tools/convert_raw_llama_weights_to_neox.py --input_dir {raw_model_path} --model_size 70B --output_dir ./model/pretrain/llama2/70B --num_output_shards 8 --pipeline_parallel
Modify your slurm configuration and config files, refer to the ./custom_config
file.
sbatch run_sft.slurm # please modify the slurm in your env or config.
python ./tools/convert_neox_llama_weights_to_hf.py --input_dir ./model/pretrain/llama2/70B/global_step0/ --model_size 70B --output_dir ./model/pretrain/llama2/70B_hf