Adding evaluation on scenario-specific quantifiable metrics #76

Jasonqi146 · 2023-10-25T09:29:34Z

Closes #29

📑 Description

Generate scenario-specific prompts and use OpenAI API to determine quantifiable losses/gains for hard scenarios

✅ Checks

My pull request adheres to the code style of this project
My code requires changes to the documentation
I have updated the documentation as required
All the tests have passed
Branch name follows type/descript (e.g. feature/add-llm-agents)
Ready for code review

ℹ Additional Information

Co-authored-by: RulinShao <srl0310@outlook.com>

… for chatting (#1827)

…(#1822)

Co-authored-by: Ying Sheng <sqy1415@gmail.com>

…… (#1831)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

…… (#1857)

…el worker. (#1866)

* Add sotopia data with the format * add template (#2) * support together ai ft --------- Co-authored-by: Ruiyi Wang <ruiyi.pamela.wang@gmail.com>

* Add multiturn data and processing file * Split train/test based on difficulty

Co-authored-by: Ruiyi Wang <ruiyi.pamela.wang@gmail.com>

* move file * add requirements.txt * add setup and test dir * add pre-commit-config * fix install dependencies issues * fix install dependencies issues * fix nonexisting pytest * fix mypy.yml * fix mypy.yml --------- Co-authored-by: Ruiyi Wang <ruiyi.pamela.wang@gmail.com>

* close together ai issue * add together ai inference example --------- Co-authored-by: Ruiyi Wang <ruiyi.pamela.wang@gmail.com>

* support qlora * upload dummy conversation data * delete doc and docker * update pyproject pip install package * continue cleaning * delete more files * delete a format

* support qlora * upload dummy conversation data * delete doc and docker * update pyproject pip install package * continue cleaning * delete more files * delete a format * add llm_deploy * add testing scripts * update deployment readme * update readme and fix some bug * finalize the inference and deployment based on vllm

* add writing and exp template * fix typo in issue template

* Add multiturn data and processing file * Split train/test based on difficulty * move multiturn data folder under together-ai-ft/

* support qlora mistral training * added deep speed to requirements * add mistral ft code * add deepspeed file and add inferene template * add deepspeed file and add inferene template * delete deepspeed, already included in the pyproject * Update .gitignore --------- Co-authored-by: Jasonqi146 <jasonqi146@gmail.com> Co-authored-by: ruiyiw <ruiyi.pamela.wang@gmail.com>

* support qlora mistral training * added deep speed to requirements * temporary save for switching disk region * added shuffle and access token * finished training pipeline; need to fix inference * finished training pipeline; need to fix inference * fixed inference pipeline * commiting to test deepspeed * added featurere to remove seq longer than 2048 * try to merge * minor changes * minor changes --------- Co-authored-by: lwaekfjlk <1125027232@qq.com> Co-authored-by: zqi2cmu <zqi2@andrew.cmu.edu>

lwaekfjlk · 2023-10-26T18:28:57Z

@Jasonqi146 can you create a folder named similar to scenario-specific evaluation? So that we have two types of eval: scenario-independent ones (sotopia) and sceario-dependent ones (your quantitiave analysis)

CodingWithTim and others added 30 commits June 29, 2023 00:00

Support Second Turn Judgement for qa_browser.py (#1810)

943bd8d

[Minor] Do not import Peft by default (#1812)

972558e

Fix: update generate_gate in model_worker.py (#1797)

fa83684

Support for our next series of models: LongChat (#1816)

685d3df

Release v0.2.16 and fix styles (#1817)

260983d

Fixing a small mistake in the peft model adapter (#1819)

53a5b30

LongChat Release (#1825)

8071859

Co-authored-by: RulinShao <srl0310@outlook.com>

Ensure the peft model adapter returns the right conversation template…

f6b5ab4

… for chatting (#1827)

Adding the pythia base adapter for simpler testing (#1824)

521cc76

revise context_len setting logic in inference.py and model_worker.py …

bc70b39

…(#1822)

Misc maintenance updates (#1830)

5ebce03

Co-authored-by: Ying Sheng <sqy1415@gmail.com>

fix bugs: remove eos token when judge_sent_end open and sentence not …

3cbe2a5

…… (#1831)

Update HF space leaderboard (#1832)

f756e77

Fix vllm worker for OpenAI API server (#1835)

81c6dd4

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

support Salesforce/codet5p-6b (#1778)

f74a64f

Release v0.2.17 & Other fixes (#1837)

3f0c6e5

Add model support and fix bug (#1818)

6d06351

Fix Multi GPU for GPTQ quantized models (#1820)

5f5a9d7

Update MT bench and arena data (#1854)

7ad1d63

revise split threading logic to avoid thread stuck when data volume i…

fcd1c63

…… (#1857)

Add a base class for model workers (#1858)

a10cb06

Make vicuna 7b default in the docker example (#1846)

c4c6403

Add compute agreement (#1855)

b2f187c

release v0.2.18

d578599

Adding a server component for running multiple models on a single mod…

5a003ab

…el worker. (#1866)

Fix tokenizer for NousResearch/Nous-Hermes-13b (#1869)

03cb7a6

Support second turn judgement and singe answer grading (#1856)

d4c3fda

Improve semaphore in the model worker (#1870)

9f1811b

Improve SSE User Experience (#1223)

fcf88ff

Revert "Improve SSE User Experience" (#1875)

da0641e

lwaekfjlk and others added 18 commits September 27, 2023 17:11

Feature/support together ai ft (#10)

1ab49e8

* Add sotopia data with the format * add template (#2) * support together ai ft --------- Co-authored-by: Ruiyi Wang <ruiyi.pamela.wang@gmail.com>

adding py (#13)

c743430

Find 14 hard env id as test and other env data as train (#25)

788bffd

* Add multiturn data and processing file * Split train/test based on difficulty

reorganize repo structure (#35)

cae1ed6

move file (#38)

2be4f40

close together ai issue (#15)

9c25d20

Co-authored-by: Ruiyi Wang <ruiyi.pamela.wang@gmail.com>

Add inference code for together ai (#42)

699863d

* close together ai issue * add together ai inference example --------- Co-authored-by: Ruiyi Wang <ruiyi.pamela.wang@gmail.com>

Minimize FastChat (#36)

14003d4

* support qlora * upload dummy conversation data * delete doc and docker * update pyproject pip install package * continue cleaning * delete more files * delete a format

add converting file for fastchat and together (#46)

e2fc307

add writing and exp issue template (#54)

ab5fe2d

Fix issue template typo (#56)

6ca2089

* add writing and exp template * fix typo in issue template

adding filtering and prompt reverse engineer code (#58)

665321f

Feature/multiturn togetherai (#59)

afb8bd7

* Add multiturn data and processing file * Split train/test based on difficulty * move multiturn data folder under together-ai-ft/

support inference on the whole dataset (#66)

1e44e74

Jasonqi146 closed this Oct 25, 2023

Jasonqi146 force-pushed the feature/hard-scenario-eval branch from 5312e95 to 0e8d939 Compare October 25, 2023 19:29

finished plotting

d2ae2c0

Jasonqi146 reopened this Oct 25, 2023

ruiyiw force-pushed the main branch from e1fae82 to 9bc461c Compare November 14, 2023 00:45

lwaekfjlk force-pushed the main branch from 9bc461c to 4fec081 Compare November 16, 2023 19:48

ruiyiw force-pushed the main branch from 4fec081 to 9bc461c Compare November 16, 2023 20:09

lwaekfjlk pushed a commit that referenced this pull request Nov 17, 2023

[Eval] Add navigation bar (#76)

b8b4813

lwaekfjlk closed this Nov 17, 2023

lwaekfjlk force-pushed the main branch from 9bc461c to 54f1e33 Compare November 17, 2023 03:00

lwaekfjlk pushed a commit that referenced this pull request Mar 14, 2024

[Eval] Add navigation bar (#76)

4800143

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding evaluation on scenario-specific quantifiable metrics #76

Adding evaluation on scenario-specific quantifiable metrics #76

Jasonqi146 commented Oct 25, 2023 •

edited

Loading

lwaekfjlk commented Oct 26, 2023

Adding evaluation on scenario-specific quantifiable metrics #76

Adding evaluation on scenario-specific quantifiable metrics #76

Conversation

Jasonqi146 commented Oct 25, 2023 • edited Loading

📑 Description

✅ Checks

ℹ Additional Information

lwaekfjlk commented Oct 26, 2023

Jasonqi146 commented Oct 25, 2023 •

edited

Loading