Skip to content

Automatic translation of Alpaca dataset into Kyrgyz language

Notifications You must be signed in to change notification settings

Akyl-AI/kyrgyz-alpaca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Kyrgyz Alpaca

This repo is made for research use only, i.e., cannot be used for commercial purposes or entertainment.

References

All of our achievements were made achievable thanks to the robust AI community in Kyrgyzstan and the contributions made by individuals within the AkylAI project (by TheCramer.com). We also express our gratitude to Stanford for their outstanding efforts and extend the accessibility of this dataset to a global audience.

Dataset

Kyrgyz Alpaca can be also downloaded from here.

We used ChatGPT and Google Translate to convert alpaca_data.json into Kyrgyz. Although the translation wasn't perfect, we found it to strike a reasonable balance between cost and quality. The total cost for translating the entire dataset into Kyrgyz was approximately $700.00. If you're interested in learning more about the dataset's creation process, you can visit the Stanford Alpaca page.

Next

We work with Kyrgyz linguists to improve the quality of the translation.

Please feel free to reach out timur.turat@gmail.com if you are interested in any forms of collaborations!

Citation

If you use the data or code from this repo, please cite this repo as follows

@misc{kyrgyz-alpaca,
  author = {Khakim Davurov, Timur Turatali, Ulan Abdurazakov},
  title = {Kyrgyz Alpaca: Models and Datasets},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Akyl-AI/kyrgyz-alpaca}},
}

About

Automatic translation of Alpaca dataset into Kyrgyz language

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published