Skip to content

Commit

Permalink
Update installation instructions and dataset download links
Browse files Browse the repository at this point in the history
  • Loading branch information
xingjian-zhang committed Jun 14, 2024
1 parent 16b2c0d commit 721c2bc
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 38 deletions.
43 changes: 5 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ computer science conferences spanning the past 50 years.
- **Accuracy**. The coverage and accuracy of MASSW have been validated through comprehensive inspections and comparisons with human annotations and alternative methods.
- **Rich benchmark tasks**. MASSW facilitates multiple novel and benchmarkable machine learning tasks, such as idea generation and outcome prediction. It supports diverse tasks centered on predicting, recommending, and expanding key elements of a scientific workflow, serving as a benchmark for evaluating LLM agents' ability to navigate scientific research.

## Install dependencies
## Installation

```bash
pip install -r requirements.txt # Install dependencies
pip install -e . # Install MASSW utils (e.g. metrics, download scripts, etc.)
pip install -e . # Install MASSW utils (e.g. metrics, evaluation scripts, etc.)
```

## Download MASSW dataset
Expand All @@ -30,39 +30,6 @@ To download the dataset, you can use the provided script:
python massw/download.py
```

Or download the dataset manually:

```bash
wget "https://www.dropbox.com/scl/fi/r2jlil9lj0ypo2fpl3fxa/massw_metadata_v1.jsonl?rlkey=ohnriak63x4ekyli25naajp0q&dl=1" -O data/massw_metadata_v1.jsonl
wget "https://www.dropbox.com/scl/fi/ykkrpf269fikuchy429l7/massw_v1.tsv?rlkey=mssrbgz3k8adij1moxqtj34ie&dl=1" -O data/massw_v1.tsv
```

## Reproduce experiments

To reproduce the benchmark results across different models and prompt types, run

```bash
python benchmark/aspect_prediction/task.py --model <model> --prompt <prompt_style> --num_samples 1020
```

where `<model>` is chosen from `{"gpt-35-turbo", "gpt-4", "mistral-8x7b"}`,
`<prompt_style>` is chosen from `{"zero-shot", "few-shot", "chain-of-thought",
"few-shot-cot"}`.

> We provide the benchmark output through a Dropbox link
> [here](https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=0).
> You could download the results and unzip them to the
> `benchmark/aspect_prediction/outputs` directory through:
>
> ```bash
> wget "https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=1" -O results_v1.zip
> unzip results_v1.zip -d benchmark/aspect_prediction
> rm results_v1.zip
> mv benchmark/aspect_prediction/results benchmark/aspect_prediction/outputs
> ```
After running the tasks, evaluate the outcomes by running:
```bash
python benchmark/aspect_prediction/eval.py --model_output_dir benchmark/aspect_prediction/outputs/gpt-35-turbo_zero-shot
```
Or download the dataset manually through Dropbox links:
[[MASSW dataset](https://www.dropbox.com/scl/fi/ykkrpf269fikuchy429l7/massw_v1.tsv?rlkey=mssrbgz3k8adij1moxqtj34ie&dl=1)]
[[MASSW metadata](https://www.dropbox.com/scl/fi/r2jlil9lj0ypo2fpl3fxa/massw_metadata_v1.jsonl?rlkey=ohnriak63x4ekyli25naajp0q&dl=1)].
30 changes: 30 additions & 0 deletions benchmark/aspect_prediction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Benchmark

To reproduce the benchmark results across different models and prompt types, run

```bash
python benchmark/aspect_prediction/task.py --model <model> --prompt <prompt_style> --num_samples 1020
```

where:

- `<model>` is chosen from `gpt-35-turbo`, `gpt-4`, `mistral-8x7b`.
- `<prompt_style>` is chosen from `zero-shot`, `few-shot`, `chain-of-thought`, `few-shot-cot`.

> We provide the benchmark output through a Dropbox link
> [here](https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=0).
> You could download the results and unzip them to the
> `benchmark/aspect_prediction/outputs` directory through:
>
> ```bash
> wget "https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=1" -O results_v1.zip
> unzip results_v1.zip -d benchmark/aspect_prediction
> rm results_v1.zip
> mv benchmark/aspect_prediction/results benchmark/aspect_prediction/outputs
> ```
After running the tasks, evaluate the outcomes by running:
```bash
python benchmark/aspect_prediction/eval.py --model_output_dir benchmark/aspect_prediction/outputs/gpt-35-turbo_zero-shot
```

0 comments on commit 721c2bc

Please sign in to comment.