Question Answering - BARTpho

About The Project

My project is called Question Answering. This is a project carried out by me when I was studying at VietAI Advanced NLP Class 02. In a nutshell, the system in this project helps us answer a Question of a given Context.

Getting Started

To get started, you should have prior knowledge on Python and Pytorch at first. A few resources to get you started if this is your first Python or Tensorflow project:

Outline

Data: UIT-ViQuAD2.0 dataset from VLSP2021.
Model: question_answering_bartpho_phobert is based on BARTpho and PhoBERT models.

According to the orginal paper, it is stated that BARTpho-syllable and BARTpho-word are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. BARTpho uses the "large" architecture and the pre-training scheme of the sequence-to-sequence denoising autoencoder BART, thus it is especially suitable for generative NLP tasks. Especially in this downstream task, based on our experiments, we choose BARTpho-syllable in preference to BARTpho-word, and PhoBERT-large in preference to PhoBERT-base.

Installation and Run

Clone the repo

git clone https://github.com/phkhanhtrinh23/question_answering_bartpho_phobert.git

Use any code editor to open the folder question_answering_bartpho_phobert.
Run pip install -r requirements.txt to install the required packages.

Note: You can install transformer as follows:

git clone --single-branch --branch fast_tokenizers_BARTpho_PhoBERT_BERTweet https://github.com/datquocnguyen/transformers.git

cd transformers

pip3 install -e .

After you have received the permission to download and use UIT-ViQuAD2.0, the structure of the dataset should be as follows:

├── data
|  └── demo.json (not from UIT-ViQuAD2.0)
|  └── test.json
|  └── train.json

Run python data.py to split the train.json into new_train.json and valid.json with 9:1 ratio respectively.
Now you can easily train the model with this command python train.py.
You can validate the model by python validate.py. This file validates the score of the trained model based on valid.json

Note: Of course, you can parse any arguments given in the ArgumentParser in both train.py and validate.py for better results.

You can infer and evaluate the results of test.json by python inference.py.

Note: Because the model cannot load and infer the whole dataset at once, validate.py and inference.py only supports inferring in batches.

SHOW TIME! Now you can run your own demo website by using Flask python api.py. The UI of the website is originated from templates folder. If possible, run this and share your results with me!

Demo

Some results:

Image 1 (from BARTPho-syllable)

Image 2 (from PhoBERT-large)

Image 3 (from PhoBERT-large)

Contribution

Contributions are what make GitHub such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the project
Create your Contribute branch: git checkout -b contribute/Contribute
Commit your changes: git commit -m 'add your messages'
Push to the branch: git push origin contribute/Contribute
Open a pull request

Contact

Email: phkhanhtrinh23@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
images		images
templates		templates
utils		utils
README.md		README.md
api.py		api.py
code.ipynb		code.ipynb
data.py		data.py
inference.py		inference.py
my_results.json		my_results.json
report_group3.pdf		report_group3.pdf
requirements.txt		requirements.txt
train.py		train.py
validate.py		validate.py
visualization_code.ipynb		visualization_code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question Answering - BARTpho

Table of contents

About The Project

Getting Started

Outline

Installation and Run

Demo

Contribution

Contact

About

Releases

Packages

Contributors 2

Languages

phkhanhtrinh23/question_answering_bartpho_phobert

Folders and files

Latest commit

History

Repository files navigation

Question Answering - BARTpho

Table of contents

About The Project

Getting Started

Outline

Installation and Run

Demo

Contribution

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages