Skip to content
/ goat Public

🐐 GOAT: Generalized Occupational Aptitude Test

License

Notifications You must be signed in to change notification settings

deepvk/goat

Repository files navigation

GOAT

This project consists of three different subprojects:

  • Parser for tasks from Russian USE;
  • Script for validation of HF models on GOAT dataset;
  • Web app with models' leaderboard after their validation on GOAT dataset;

Installation

Firstly, you need to install the necessary libraries. To do this, run the following commands:

pip install -r requirements.txt

Parser

This parser was used to gather tasks for GOAT dataset. It is using Scrapy lib.

Currently, program parses tests from the Unified State Exam (EGE or OGE) from the sdamgia website.

Structure

Program takes exam subject, exam type, test id and the desired output file name as command-line arguments. The parsing result is supposed to be stored in a jsonl file.

Additionally, in the goat folder, there is a script called dataset_demonstration.py.

After you run it (instructions on how to run it are provided below), it will display one task of each type from the parsed test in the console.

Usage

Firstly, you need to install parser dependencies. To do this, run the following command:

pip install -e ".[parser]"

To run the parser, run the following command from goat/parser directory:

scrapy crawl sdamgia \
    -a subject='your exam subject' \
    -a exam_type='your exam type' \
    -a test_id='your test id' \
    -O <output file>

your exam subject indicates which subject the exam is in. Currently acceptable subject values are 'soc' and 'lit'.

your exam type indicates from what exam your test was taken. Currently acceptable exam type values are 'ege' and 'oge'.

your test id is the test id for Unified State Exam in chosen subject from the sdamgia website.

output file is file name that parser will generate or overwrite with parsing output. For example - ege_data.jsonl.

To run the dataset_demonstration.py script, execute the following command from the root directory:

python goat/parser/dataset_demonstration.py -f <parser output file name>

where parser output file name is the name of the jsonl file that parser has generated.

Leaderboard frontend

Structure

My leaderboard follows similar structure that Open LLM Leaderboard uses. It is a gradio web app that is used in a HuggingFace space. Database info is stored in environment variables.

Usage

Firstly, you need to install frontend dependencies. To do this, run the following command:

pip install -e ".[frontend]"

In this app you can send your model validation request to backend database and after some time validation result on your model will appear in the leaderboard folder after reloading the app.

To run leaderboard web app execute this command from root directory (it is supposed that you have set all needed environment variables for database connection):

python -m goat.frontend.app

Leaderboard backend

Structure

Leaderboard backend after receiving new validation request validate the model in the request on GOAT dataset using modified LM Evaluation Harness benchmark and FastChat LLM-as-judge benchmark from deepvk repositories. After finishing validation it adds the resulting scores in the leaderboard.

Usage

Firstly, you need to install backend dependencies. To do this, run the following commands:

pip install -e ".[backend]"
pip install -U wheel
pip install flash-attn==2.5.8

To run leaderboard backend execute this command from root directory (it is supposed that you have set all needed environment variables for database connection):

python -m goat.backend.eval

After running the script, it will listen to new validation requests in the database. After receiving new request it will start validating the model in the request on GOAT dataset. After getting results of the validation it will add these results in leaderboard table in database.

About

🐐 GOAT: Generalized Occupational Aptitude Test

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published