KhanomTanLLM

KhanomTan (Thai name is ขนมตาล) + LLM

Image gen from FLUX.1 [dev]

KhanomTan LLM is a bilingual language model trained in Thai and English from open source dataset by PyThaiNLP. We train the model from public dataset only. It is a fully open source model. We releses the dataset, training pipeline, and models.

Codename: numfa-v2

Blog Post (Thai): https://pythainlp.org/2024-09-12-khanomtanllm/

Online Demo: https://huggingface.co/spaces/wannaphong/KhanomTanLLM-demo
Pretraining dataset: https://huggingface.co/datasets/wannaphong/KhanomTanLLM-pretrained-dataset
- Thai subset only: https://huggingface.co/datasets/wannaphong/KhanomTanLLM-pretrained-dataset-thai-subset
- List Thai subset: https://huggingface.co/collections/pythainlp/datasets-for-pretrained-thai-llm-65db96ab730386b492889a98
Pretraining script: https://github.com/wannaphong/EasyLM/tree/KhanomTanLLM-pretraining
Pretrained Models:
- 1B: https://huggingface.co/pythainlp/KhanomTanLLM-1B
- 3B: https://huggingface.co/pythainlp/KhanomTanLLM-3B
Instruct Models:
- Instruct dataset: wannaphong/KhanomTanLLM-Instruct-dataset
- SFT Script: https://github.com/PyThaiNLP/KhanomTanLLM/tree/main/finetuning
- 1B: https://huggingface.co/pythainlp/KhanomTanLLM-1B-Instruct
- 3B: https://huggingface.co/pythainlp/KhanomTanLLM-3B-Instruct/

Instruct Models

We fine-turning model from wannaphong/KhanomTanLLM-Instruct-dataset. We doesn't have any safeguard, so use your risk.

To get the best result, we suggest the setting:

temperature: 2 - 4
min_p: > 0.6

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). We use TPU4-64 for training model.

Thank you TPU Research Cloud and EasyLM project! We use EasyLM for pretraining model.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
finetuning		finetuning
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KhanomTanLLM

Instruct Models

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

PyThaiNLP/KhanomTanLLM

Folders and files

Latest commit

History

Repository files navigation

KhanomTanLLM

Instruct Models

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages