langmo

The library for distributed pretraining and finetuning of language models.

Supported features:

vanilla pre-training of BERT-like models
distributed training on multi-node/multi-GPU systems
benchmarking/finetuning the following tasks
- all GLUE
- MNLI + additional validation on HANS
- more coming soon
using siamese architectures for finutuning

Pretraining

Pretraining a model:

mpirun -np N python -m langmo.pretraining config.yaml

langmo saves 2 types of snapshots: in pytorch_ligning format

To resume crashed/aborted pretraining session:

mpirun -np N python -m langmo.pretraining.resume path_to_run

Finetuning/Evaluation

Finetuning on one of the GLUE tasks:

mpirun -np N python -m langmo.benchmarks.GLUE config.yaml glue_task

supported tasks: cola, rte, stsb, mnli, mnli-mm, mrpc, sst2, qqp, qnli

NLI task has additional special implentation which supports validation on adversarial HANS dataset, as well as additional staticics for each label/heuristic.

To perfrorm fibetuning on NLI run as:

mpirun -np N python -m langmo.benchmarks.NLI config.yaml

Finetuning on extractive question-answering tasks:

mpirun -np N python -m langmo.benchmarks.QA config.yaml qa_task

supported tasks: squad, squad_v2

example config file:

model_name: "roberta-base"
batch_size: 32
cnt_epochs: 4
path_results: ./logs
max_lr: 0.0005
siamese: true
freeze_encoder: false
encoder_wrapper: pooler
shuffle: true

Automatic evaluation

langmo supports automatic scheduling of evaluation runs for a model saved in a given location, or for all snapshots found int /snapshots folder. To configure langmo the user has to create the following file:

./configs/langmo.yaml with entry "submit_command" correspoding to a job submission command of a given cluster. If the file is not present, the jobs will not be submitted to the job queue, but executed immediately one by one on the same node.

./configs/auto_finetune.inc - the content of this file will be copied to the beginning of the job scripts. Place here directive for e.g. slurm job scheduler such as which resource group to use, how many nodes to allocate, time limit etc. Set up all necessary environment variables, particulalry NUM_GPUS_PER_NODE and PL_TORCH_DISTRIBUTED_BACKED (MPI, NCCL or GLOO). Finally add mpirun command with necessay option and end the file with new line. Command to invoke langmo in the right way will be added automatically.

./configs/auto_finetune.yaml - any parameters such as batch size etc to owerride the defaults in a fine-tuning run.

To schedule evaluation jobs run from the login node:

python -m langmo.benchmarks path_to_model task_name

the results will be saved in the eval/task_name/run_name/ subfolder in the same folder the model is saved.

Fugaku notes

Add these lines before the return of _compare_version statement of pytorch_lightning/utilities/imports.py.:

if str(pkg_version).startswith(version):
    return True

This sed command should do the trick:

sed -i -e '/pkg_version = Version(pkg_version.base_version/a\    if str(pkg_version).startswith(version):\n\        return True' \
  ~/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/imports.py

Name		Name	Last commit message	Last commit date
Latest commit History 769 Commits
data		data
langmo		langmo
tests		tests
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
README.rst		README.rst
finetune.yaml		finetune.yaml
fsub_pretrain.sh		fsub_pretrain.sh
fu_prepare_venv.sh		fu_prepare_venv.sh
fu_pretrain.yaml		fu_pretrain.yaml
fu_pretrain_mpi.sh		fu_pretrain_mpi.sh
fu_pretrain_single_rank.sh		fu_pretrain_single_rank.sh
fu_submit.sh		fu_submit.sh
modules.sh		modules.sh
multitask_test_config.yaml		multitask_test_config.yaml
nli_just_check.yaml		nli_just_check.yaml
nli_mini_test.yaml		nli_mini_test.yaml
prepare_test_data.sh		prepare_test_data.sh
pretrain_minimal_test.yaml		pretrain_minimal_test.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
robertalbert_pretraining_test.yaml		robertalbert_pretraining_test.yaml
run_eval.py		run_eval.py
run_nli.sh		run_nli.sh
run_pretrain.sh		run_pretrain.sh
sample.yaml		sample.yaml
setup.py		setup.py
setup_boilerplate.py		setup_boilerplate.py
sub_finetune.sh		sub_finetune.sh
sub_pretrain.sh		sub_pretrain.sh
w2v.yaml		w2v.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langmo

Pretraining

Finetuning/Evaluation

Automatic evaluation

Fugaku notes

About

Releases

Packages

Contributors 3

Languages

vecto-ai/langmo

Folders and files

Latest commit

History

Repository files navigation

langmo

Pretraining

Finetuning/Evaluation

Automatic evaluation

Fugaku notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages