Skip to content

akylbekmaxutov/LLM-eval-using-Kazakh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

LLM-eval-using-Kazakh

This repository provides the datasets and prompts utilized in the paper titled "Do LLMs Speak Kazakh? A Pilot Evaluation of Seven Models."


prompts.py This script introduces the prompts used in the experiment.

Functions:

math_english_instructions - this function is english-instructed prompt for the 'NIS Math' dataset.

math_kazakh_instructions - this function is kazakh-instructed prompt for the 'NIS Math' dataset.

spelling_english_instructions - this function is english-instructed prompt for the 'kkWikiSpell' dataset.

spelling_kazakh_instructions - this function is kazakh-instructed prompt for the 'kkWikiSpell' dataset.

belebele_english_instructions - this function is english-instructed prompt for the 'Belebele' dataset.

belebele_kazakh_instructions - this function is kazakh-instructed prompt for the 'Belebele' dataset.

copa_english_instructions - this function is english-instructed prompt for the 'kkCOPA' dataset.

copa_kazakh_instructions - this function is kazakh-instructed prompt for the 'kkCOPA' dataset.

flores_english_instructions - this function is english-instructed prompt for the 'Flores-101' dataset.

flores_kazakh_instructions - this function is kazakh-instructed prompt for the 'Flores-101' dataset.

kazqad_english_instructions - this function is english-instructed prompt for the 'KazQAD_part' dataset.

kazqad_kazakh_instructions - this function is kazakh-instructed prompt for the 'KazQAD_part' dataset.

kazqad_english_instructions_closed - this function is english-instructed prompt for the 'KazQAD' dataset.


Datasets for "Do LLMs Speak Kazakh? A Pilot Evaluation of Seven Models"

Belebele (Bandarkar et al., 2023)

Task: Multiple-choice QA
Size: 900
Metric: Accuracy
Language: Human-translated
Description: A dataset containing multiple-choice questions and answers used to evaluate the ability of language models to understand and generate Kazakh text.

Task: Causal reasoning
Size: 500
Metric: Accuracy
Language: Machine-translated
Description: The Kazakh version of the Choice of Plausible Alternatives (COPA) dataset is used to test commonsense reasoning by providing scenarios with multiple-choice questions.

Task: School Math
Size: 100
Metric: Accuracy
Language: Originally in Kazakh
Description: This dataset contains mathematical problems and prompts sourced from the Nazarbayev Intellectual Schools' curriculum, used to assess mathematical reasoning in Kazakh.

KazQad_part (Yeshpanov et al., 2024)

Task: Reading comprehension
Size: 1,000
Metric: Token-level F1
Language: Originally in Kazakh
Description: KazQad is a comprehensive dataset designed to test the question-answering capabilities of language models in Kazakh, covering a wide range of topics.

KazQad (Yeshpanov et al., 2024)

Task: Generative QA
Size: 1,927
Metric: Token-level recall
Language: Originally in Kazakh
Description: A subset of the KazQad dataset, providing a focused selection of question-answering prompts for more targeted evaluation.

Task: Spelling correction
Size: 160
Metric: Token-level Jaccard
Language: Originally in Kazakh
Description: A spelling correction dataset derived from Kazakh Wikipedia entries, aimed at evaluating the spelling and grammatical correction capabilities of language models in Kazakh.

Flores-101 (Goyal et al., 2022)

Task: Machine translation
Size: 500
Metric: BLEU
Language: Human-translated
Description: This dataset is part of the Flores-101 initiative, aimed at providing high-quality translations and prompts for evaluating multilingual language models, including Kazakh.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages