p-IgGen

p-IgGen is a paired antibody auto-regressive langauge model. This package provides utlity functions for generating and scoring antibody sequences using p-IgGen, with model weights stored on HuggingFace.

Features

Generate full-length antibody sequences.
Generate a heavy chain given a light chain and vice versa.
Generate full-length antibody sequences given an initial sequence.
Calculate log likelihoods of sequences.
VH and VL chains of generated sequences can be optionally seperated using ANARCI.

Installation

We advise installing using a conda environment.

Prerequisites

Conda

Step-by-Step Setup

Create a new conda environemnt:

conda create -n my_env
conda activate my_env
conda install python pip -y

Install this repository:

pip install https://github.com/OliverT1/p-IgGen.git

Install optional ANARCI dependency (for --separate_chains option):
```
conda install -c bioconda anarci
```

Usage

Command Line Interface

Generate Sequences

To generate new antibody sequences, use the piggen_generate command:

piggen_generate --output_file output_sequences.txt --n_sequences 100

Sequences are generated by default in direction VH->VL, from C-term to N-term. Alternatively, they can be genreated in reverse from VL->VH, from N-term to C-term using the --backwards flag. This allows generation given an VH or VL sequence of any length.

Note:

If --backwards is used, the --initial_sequence should be provided in reverse, starting from the N-term of the VL chain.
If heavy_chain_file or light_chain_file this inversion is handled autmoamtically, and the VH and VL chains should be provide in the standard direction.

Options:

--developable: Use the developable model.
--heavy_chain_file FILE: File containing heavy chain sequences to generate light chains from.
--light_chain_file FILE: File containing light chain sequences to generate heavy chains from.
--initial_sequence TEXT: Initial sequence to generate from.
--n_sequences INTEGER: Number of sequences to generate, per input sequence if applicable.
--top_p FLOAT: Top-p sampling value (default: 0.95).
--temp FLOAT: Temperature for generation (default: 1.2).
--bottom_n_percent FLOAT: Bottom n percent of sequences to discard based on likelihood (default: 5).
--backwards: Generate sequences in reverse.
--output_file FILE: File to save the generated sequences (required).
--separate_chains: Output VH and VL sequences separately, requires ANARCI.

Using bottom_n_percent requires n_sequences to be at least 100, otherwise this option is ignored.

Calculate Log Likelihoods

To calculate the log likelihoods of sequences, use the piggen_likelihood command:

sh
python cli.py likelihood --sequence_file input_sequences.txt --output_file log_likelihoods.txt

Options:

--developable: Use the developable model.
--sequence_file FILE: The file containing sequences to calculate log likelihoods.
--batch_size INTEGER: Batch size for processing sequences.
--output_file FILE: File to save the log likelihoods.
--local: Load model from local path.

Examples

Generate Light Chains for Provided Heavy Chain :

piggen_generate --heavy_chain_file heavy_chains.txt --n_seqeunces 5 --top_p 0.95 --temp 1.25 --output_file generated_sequences.txt

Heavy chains should be seperate by new lines. Here, five light chains will be generated for each heavy chain.

Calculate Log Likelihoods for Sequences:

piggen_likelihood --sequence_file sequences.txt --batch_size 2 --output_file log_likelihoods.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
piggen		piggen
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

p-IgGen

Features

Installation

Prerequisites

Step-by-Step Setup

Usage

Command Line Interface

Generate Sequences

Calculate Log Likelihoods

Examples

About

Releases

Packages

Languages

License

oxpig/p-IgGen

Folders and files

Latest commit

History

Repository files navigation

p-IgGen

Features

Installation

Prerequisites

Step-by-Step Setup

Usage

Command Line Interface

Generate Sequences

Calculate Log Likelihoods

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages