-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update installation instructions and dataset download links
- Loading branch information
1 parent
16b2c0d
commit 721c2bc
Showing
2 changed files
with
35 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Benchmark | ||
|
||
To reproduce the benchmark results across different models and prompt types, run | ||
|
||
```bash | ||
python benchmark/aspect_prediction/task.py --model <model> --prompt <prompt_style> --num_samples 1020 | ||
``` | ||
|
||
where: | ||
|
||
- `<model>` is chosen from `gpt-35-turbo`, `gpt-4`, `mistral-8x7b`. | ||
- `<prompt_style>` is chosen from `zero-shot`, `few-shot`, `chain-of-thought`, `few-shot-cot`. | ||
|
||
> We provide the benchmark output through a Dropbox link | ||
> [here](https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=0). | ||
> You could download the results and unzip them to the | ||
> `benchmark/aspect_prediction/outputs` directory through: | ||
> | ||
> ```bash | ||
> wget "https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=1" -O results_v1.zip | ||
> unzip results_v1.zip -d benchmark/aspect_prediction | ||
> rm results_v1.zip | ||
> mv benchmark/aspect_prediction/results benchmark/aspect_prediction/outputs | ||
> ``` | ||
After running the tasks, evaluate the outcomes by running: | ||
```bash | ||
python benchmark/aspect_prediction/eval.py --model_output_dir benchmark/aspect_prediction/outputs/gpt-35-turbo_zero-shot | ||
``` |