IgLM

Official repository for IgLM: Generative Language Modeling for Antibody Design.

The code and pre-trained models from this work are made available for non-commercial use under the terms of the JHU Academic Software License Agreement. For commercial inquiries, please contact Johns Hopkins Tech Ventures at awichma2@jhu.edu. The training and test datasets for IgLM are available for download here.

Try antibody sequence generation in Google Colab.

Setup

To use IgLM, install via pip:

pip install iglm

Alternatively, you can clone this repository and install the package locally:

$ git clone git@github.com:Graylab/IgLM.git 
$ pip install IgLM

Command line usage

IgLM supports sequence infilling, sequence generation (with prompting), and sequence evaluation from the command line.

Re-design spans of an antibody sequence

To use IgLM to re-design spans of an antibody sequence, supply the fasta file, the fasta record ID corresponding to the sequence to design, the start index of the span (0-indexed), and the end index of the span (0-indexed, exclusive).

To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:

iglm_infill data/antibodies/1jpt/1jpt.fasta :H 98 106 --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100

Full antibody sequence generation

IgLM can be used to generate full antibody sequences while conditioning on the chain type and species-of-origin.

To generate 100 unique human heavy chain sequences starting with EVQ:

iglm_generate --prompt_sequence EVQ --chain_token [HEAVY] --species_token [HUMAN] --num_seqs 100

To generate 100 unique nanobody sequences starting with QVQ:

iglm_generate --prompt_sequence QVQ --chain_token [HEAVY] --species_token [CAMEL] --num_seqs 100

Sequence evaluation

IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.

Full sequence log likelihood calculation:

iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --chain_token [HEAVY] --species_token [HUMAN]

Infilled sequence log likelihood calculation:

iglm_evaluate data/antibodies/1jpt/1jpt.fasta :H --start 98 --end 106 --chain_token [HEAVY] --species_token [HUMAN]

Package usage

IgLM may also be used as a Python package, enabling the above use cases and more flexible usage.

Re-design spans of an antibody sequence

To use IgLM to re-design spans of an antibody sequence, supply the sequence to design, the start index of the span (0-indexed), and the end index of the span (0-indexed, exclusive).

To generate 100 unique sequences of the anti-tissue factor antibody (1JPT) heavy chain with an IgLM-designed CDR3:

from iglm import IgLM

iglm = IgLM()

parent_sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)
num_seqs = 100

generated_seqs = iglm.infill(
    parent_sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
    num_to_generate=num_seqs,
)

Full antibody sequence generation

IgLM can be used to generate full antibody sequences while conditioning on the chain type and species-of-origin.

To generate 100 unique human heavy chain sequences starting with EVQ:

from iglm import IgLM

iglm = IgLM()

prompt_sequence = "EVQ"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
num_seqs = 100

generated_seqs = iglm.generate(
    chain_token,
    species_token,
    prompt_sequence=prompt_sequence,
    num_to_generate=num_seqs,
)

To generate 100 unique nanobody sequences starting with QVQ:

from iglm import IgLM

iglm = IgLM()

prompt_sequence = "QVQ"
chain_token = "[HEAVY]"
species_token = "[CAMEL]"
num_seqs = 100

generated_seqs = iglm.generate(
    chain_token,
    species_token,
    prompt_sequence=prompt_sequence,
    num_to_generate=num_seqs,
)

Sequence evaluation

IgLM can be used to calculate the log likelihood of a sequence given a chain type and species-of-origin.

Full sequence log likelihood calculation:

import math
from iglm import IgLM

iglm = IgLM()

sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"

log_likelihood = iglm.log_likelihood(
    sequence,
    chain_token,
    species_token,
)
perplexity = math.exp(-log_likelihood)

Infilled sequence log likelihood calculation:

import math
from iglm import IgLM

iglm = IgLM()

sequence = "EVQLVESGGGLVQPGGSLRLSCAASGFNIKEYYMHWVRQAPGKGLEWVGLIDPEQGNTIYDPKFQDRATISADNSKNTAYLQMNSLRAEDTAVYYCARDTAAYFDYWGQGTLVTVS"
chain_token = "[HEAVY]"
species_token = "[HUMAN]"
infill_range = (98, 106)

log_likelihood = iglm.log_likelihood(
    sequence,
    chain_token,
    species_token,
    infill_range=infill_range,
)
perplexity = math.exp(-log_likelihood)

Citing this work

@article{shuai2023iglm,
  title={IgLM: Infilling language modeling for antibody sequence design},
  author={Shuai, Richard W and Ruffolo, Jeffrey A and Gray, Jeffrey J},
  journal={Cell Systems},
  volume={14},
  number={11},
  pages={979--989},
  year={2023},
  publisher={Elsevier}
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
analysis		analysis
iglm		iglm
.gitignore		.gitignore
IgLM.ipynb		IgLM.ipynb
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IgLM

Setup

Command line usage

Re-design spans of an antibody sequence

Full antibody sequence generation

Sequence evaluation

Package usage

Re-design spans of an antibody sequence

Full antibody sequence generation

Sequence evaluation

Citing this work

About

Contributors 2

Languages

License

Graylab/IgLM

Folders and files

Latest commit

History

Repository files navigation

IgLM

Setup

Command line usage

Re-design spans of an antibody sequence

Full antibody sequence generation

Sequence evaluation

Package usage

Re-design spans of an antibody sequence

Full antibody sequence generation

Sequence evaluation

Citing this work

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages