Group 2: PLBART with AST

This project covers the implementation of AST for the PLBART model. Our model is fine-tuned using the CodeXGlue dataset. The original PLBART code can be found here. The original paper is also linked, Unified Pre-training for Program Understanding and Generation. We used WSL2 for the project.

Docker Setup

There is the option to use a docker image to run the project. For this image all the dependencies are already installed and you can skip the local setup. The rest of the steps for performing pre-processing, fine-tuning, and evaluation are the same as detailed below. A NVIDIA GPU is necessary to use CUDA with GPU.

Install docker and run the following command to access the image.

docker run -it --gpus all jmoreirakanaley/plbart-ast:final

Activate the conda environment for running the experiments.

conda activate plbart

Access the project.

cd ~/plbart-ast

Local Setup

We can setup a conda environment in order to run experiments, the first step is to download the dependencies. We assume anaconda is installed. The additional requirements (noted in requirements.txt) can be installed by running the following script:

bash install_env.sh

Pre-processing Step

If you wish to only preprocess the dataset follow these steps.

Step 1. Build parser

cd scripts/code_to_code/translation/parser
bash build.sh
cd ..

Step 2. Prepare the data

bash prepare.sh src_lang tgt_lang
cd ../../..

Fine-tuning

We fine-tune and evaluate PLBART with AST on the code-to-code downstream task from CodeXGLUE.

Type	Task	Language(s)	Data	Scripts	Checkpoints	Results
Code to Code	Code translation	Java, C#	[LINK]	[LINK]	[LINK]	[LINK]

Step 1. Download original PLBART base model

cd pretrain
bash download.sh
cd ..

Step 2. Build parser for CodeBLEU evaluation (skip if already done)

cd scripts/code_to_code/translation/parser
bash build.sh
cd ../../../..

Step 3. Prepare the data, train and evaluate PLBART

cd scripts/code_to_code/translation
bash prepare.sh src_lang tgt_lang
bash run.sh GPU_IDS src_lang tgt_lang model_size
cd ../../..

Here is an example for fine-tuning from java to c#:

bash run.sh 0 java cs base

Note. We fine-tuned our model on 1 NVIDIA Quadro P1000 (4gb) GPU (~ 8 hours).

Evaluation

If you wish to only evaluate the model against the CodeXGLUE benchmark.

Step 1. Download PLBART AST fine-tuned checkpoints

To download the fine-tuned models for java to c# and c# to java

cd scripts/code_to_code/translation
bash download.sh

Step 2. Evaluate against CodeXGLUE

Run the evaluate.sh file

bash evaluate.sh GPU_IDS src_lang tgt_lang

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
evaluation		evaluation
multilingual		multilingual
pretrain		pretrain
scripts		scripts
sentencepiece		sentencepiece
source		source
.gitignore		.gitignore
FILEs.md		FILEs.md
LICENSE		LICENSE
README.md		README.md
install_env.sh		install_env.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Group 2: PLBART with AST

Docker Setup

Local Setup

Pre-processing Step

Step 1. Build parser

Step 2. Prepare the data

Fine-tuning

Step 1. Download original PLBART base model

Step 2. Build parser for CodeBLEU evaluation (skip if already done)

Step 3. Prepare the data, train and evaluate PLBART

Evaluation

Step 1. Download PLBART AST fine-tuned checkpoints

Step 2. Evaluate against CodeXGLUE

About

Releases

Packages

Contributors 2

Languages

License

ML4SE2022/group2

Folders and files

Latest commit

History

Repository files navigation

Group 2: PLBART with AST

Docker Setup

Local Setup

Pre-processing Step

Step 1. Build parser

Step 2. Prepare the data

Fine-tuning

Step 1. Download original PLBART base model

Step 2. Build parser for CodeBLEU evaluation (skip if already done)

Step 3. Prepare the data, train and evaluate PLBART

Evaluation

Step 1. Download PLBART AST fine-tuned checkpoints

Step 2. Evaluate against CodeXGLUE

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages