Update installation instructions and dataset download links

xingjian-zhang · Jun 14, 2024 · 721c2bc · 721c2bc
1 parent 16b2c0d
commit 721c2bc
Show file tree

Hide file tree

Showing 2 changed files with 35 additions and 38 deletions.
diff --git a/README.md b/README.md
@@ -15,11 +15,11 @@ computer science conferences spanning the past 50 years.
 - **Accuracy**. The coverage and accuracy of MASSW have been validated through comprehensive inspections and comparisons with human annotations and alternative methods.
 - **Rich benchmark tasks**. MASSW facilitates multiple novel and benchmarkable machine learning tasks, such as idea generation and outcome prediction. It supports diverse tasks centered on predicting, recommending, and expanding key elements of a scientific workflow, serving as a benchmark for evaluating LLM agents' ability to navigate scientific research.
 
-## Install dependencies
+## Installation
 
 ```bash
 pip install -r requirements.txt # Install dependencies
-pip install -e . # Install MASSW utils (e.g. metrics, download scripts, etc.)
+pip install -e . # Install MASSW utils (e.g. metrics, evaluation scripts, etc.)
 ```
 
 ## Download MASSW dataset
@@ -30,39 +30,6 @@ To download the dataset, you can use the provided script:
 python massw/download.py
 ```
 
-Or download the dataset manually:
-
-```bash
-wget "https://www.dropbox.com/scl/fi/r2jlil9lj0ypo2fpl3fxa/massw_metadata_v1.jsonl?rlkey=ohnriak63x4ekyli25naajp0q&dl=1" -O data/massw_metadata_v1.jsonl
-wget "https://www.dropbox.com/scl/fi/ykkrpf269fikuchy429l7/massw_v1.tsv?rlkey=mssrbgz3k8adij1moxqtj34ie&dl=1" -O data/massw_v1.tsv
-```
-
-## Reproduce experiments
-
-To reproduce the benchmark results across different models and prompt types, run
-
-```bash
-python benchmark/aspect_prediction/task.py --model <model> --prompt <prompt_style> --num_samples 1020
-```
-
-where `<model>` is chosen from `{"gpt-35-turbo", "gpt-4", "mistral-8x7b"}`,
-`<prompt_style>` is chosen from `{"zero-shot", "few-shot", "chain-of-thought",
-"few-shot-cot"}`.
-
-> We provide the benchmark output through a Dropbox link
-> [here](https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=0).
-> You could download the results and unzip them to the
-> `benchmark/aspect_prediction/outputs` directory through:
->
-> ```bash
-> wget "https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=1" -O results_v1.zip
-> unzip results_v1.zip -d benchmark/aspect_prediction
-> rm results_v1.zip
-> mv benchmark/aspect_prediction/results benchmark/aspect_prediction/outputs
-> ```
-
-After running the tasks, evaluate the outcomes by running:
-
-```bash
-python benchmark/aspect_prediction/eval.py --model_output_dir benchmark/aspect_prediction/outputs/gpt-35-turbo_zero-shot
-```
+Or download the dataset manually through Dropbox links:
+[[MASSW dataset](https://www.dropbox.com/scl/fi/ykkrpf269fikuchy429l7/massw_v1.tsv?rlkey=mssrbgz3k8adij1moxqtj34ie&dl=1)]
+[[MASSW metadata](https://www.dropbox.com/scl/fi/r2jlil9lj0ypo2fpl3fxa/massw_metadata_v1.jsonl?rlkey=ohnriak63x4ekyli25naajp0q&dl=1)].
diff --git a/benchmark/aspect_prediction/README.md b/benchmark/aspect_prediction/README.md
@@ -0,0 +1,30 @@
+# Benchmark
+
+To reproduce the benchmark results across different models and prompt types, run
+
+```bash
+python benchmark/aspect_prediction/task.py --model <model> --prompt <prompt_style> --num_samples 1020
+```
+
+where:
+
+- `<model>` is chosen from `gpt-35-turbo`, `gpt-4`, `mistral-8x7b`.
+- `<prompt_style>` is chosen from `zero-shot`, `few-shot`, `chain-of-thought`, `few-shot-cot`.
+
+> We provide the benchmark output through a Dropbox link
+> [here](https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=0).
+> You could download the results and unzip them to the
+> `benchmark/aspect_prediction/outputs` directory through:
+>
+> ```bash
+> wget "https://www.dropbox.com/scl/fi/nap87vh9s2mc7v3daql5u/results_v1.zip?rlkey=m1n5vck90quwhqygiq1otn2zp&dl=1" -O results_v1.zip
+> unzip results_v1.zip -d benchmark/aspect_prediction
+> rm results_v1.zip
+> mv benchmark/aspect_prediction/results benchmark/aspect_prediction/outputs
+> ```
+
+After running the tasks, evaluate the outcomes by running:
+
+```bash
+python benchmark/aspect_prediction/eval.py --model_output_dir benchmark/aspect_prediction/outputs/gpt-35-turbo_zero-shot
+```