Vietnamese-English Machine Translation with VinaLLaMA-7B

This repository contains the code and data processing for finetuning VinaLLaMA-7B and VinaLLaMA-7B-chat in the paper "VinaLLaMA-7B: A Large-Scale Vietnamese-English Machine Translation Model" by Hieu Pham, Dat Quoc Nguyen, Thi Ngoc Diep Do, Minh Nguyen, and Son N. Tran on machine translation task.

The model is finetuned on teencode and slang data from social media text data UIT-VSMEC (translated to English using GPT4), synthetic data (generated using GPT4), parallel dataset mt_eng_vietnamese (HuggingFace).

The instruction prompt used for finetuning is MTInstruct, AlignInstruct, HintInstruct, ReviseInstruct in the paper "Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages" by Zhuoyuan Mao and Yen Yu.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
RDRsegmenter @ e15dfca		RDRsegmenter @ e15dfca
fast_align @ cab1e9a		fast_align @ cab1e9a
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vietnamese-English Machine Translation with VinaLLaMA-7B

About

Releases

Packages

Languages

nguyen1207/machine_translation

Folders and files

Latest commit

History

Repository files navigation

Vietnamese-English Machine Translation with VinaLLaMA-7B

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages