Skip to content

Commit

Permalink
Feature/move multiturn data (#84)
Browse files Browse the repository at this point in the history
* support qlora mistral training

* added deep speed to requirements

* temporary save for switching disk region

* added shuffle and access token

* finished training pipeline; need to fix inference

* finished training pipeline; need to fix inference

* fixed inference pipeline

* commiting to test deepspeed

* added featurere to remove seq longer than 2048

* try to merge

* minor changes

* minor changes

* Move together data

* rename data process files and add together multiturn data preprocess

---------

Co-authored-by: lwaekfjlk <1125027232@qq.com>
Co-authored-by: Jasonqi146 <jasonqi146@gmail.com>
Co-authored-by: zqi2cmu <zqi2@andrew.cmu.edu>
Co-authored-by: Wonderplex <50866817+Jasonqi146@users.noreply.github.com>
(cherry picked from commit 6f285ba)
  • Loading branch information
ruiyiw authored and lwaekfjlk committed Mar 14, 2024
1 parent e8581e4 commit c018e01
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 0 deletions.
Empty file removed data_process/dummyfile
Empty file.
20 changes: 20 additions & 0 deletions data_process/fastchat_data_preprocess.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import json
import os

sotopia_data_dir = "/Users/pamela/Documents/capstone/sotopia-ft-data/ft-data-gpt4-gpt4-easy-2-side-partial"

ft_data_list = []
count = 0
for file in os.listdir(sotopia_data_dir):
with open(os.path.join(sotopia_data_dir, file), 'r') as f:
file_dict = json.load(f)
fastchat_dict = {"id": f"identity_{count}", "conversations": []}
fastchat_dict["conversations"].append(
{"from": "human", "value": file_dict["prompt"]})
fastchat_dict["conversations"].append(
{"from": "gpt", "value": file_dict["result"]})
ft_data_list.append(fastchat_dict)
count += 1

with open("fastchat-ft-gp4-gpt4-easy-2-side-partial.json", "w") as f:
f.write(json.dumps(ft_data_list, indent=4))

0 comments on commit c018e01

Please sign in to comment.