Skip to content

Latest commit

 

History

History
172 lines (142 loc) · 7.03 KB

README.md

File metadata and controls

172 lines (142 loc) · 7.03 KB

Modern Hopfield Networks and Attention for Immune Repertoire Classification

Michael Widrich1, Bernhard Schäfl1, Milena Pavlović3 4, Hubert Ramsauer1, Lukas Gruber1, Markus Holzleitner1, Johannes Brandstetter1, Geir Kjetil Sandve4, Victor Greiff3, Sepp Hochreiter1 2, Günter Klambauer1

(1) ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
(2) Institute of Advanced Research in Artificial Intelligence (IARAI)
(3) Department of Immunology, University of Oslo, Oslo, Norway
(4) Department of Informatics, University of Oslo, Oslo, Norway

This package provides:

  • modular and customizable DeepRC implementation for massive multiple instance learning problems, such as immune repertoire classification,
  • CNN and LSTM sequence embedding,
  • single- or multi-task settings (simple building-block principle),
  • support for custom datasets,
  • examples that you can quickly adapt to your problem settings.

Will be added:

  • multiple attention heads/queries,
  • Integrated Gradients analysis (write me an email (widrich at ml.jku.at) if you urgently need a preliminary version).

Installation

pip

You can install this package via pip:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install git+https://github.com/ml-jku/DeepRC

To update your installation with dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --upgrade git+https://github.com/ml-jku/DeepRC

To update your installation without dependencies, you can use:

pip install --no-dependencies git+https://github.com/widmi/widis-lstm-tools
pip install --no-dependencies --upgrade git+https://github.com/ml-jku/DeepRC

Usage

To run the examples, download the github repo as .zip file, extract the .zip file, and navigate into the extracted directory (you should see a deeprc folder and the README.md there).

Can't wait? Examples are here: deeprc/examples/.

Training DeepRC on pre-defined datasets

You can train a DeepRC model on the pre-defined datasets of the DeepRC paper using one of the Python files in folder deeprc/examples/examples_from_paper. The datasets will be downloaded automatically (please only download them once and then reuse the downloaded versions).

You can use tensorboard --logdir [results_directory] --port=6060 and open http://localhost:6060/ in your web-browser to view the performance.

Real-world data with implanted signals

This is category has the smallest dataset files and is a good starting point. Training a binary DeepRC classifier on dataset "0" of category "real-world data with implanted signals":

python3 -m deeprc.examples.examples_from_paper.cmv_with_implanted_signals 0 --n_updates 10000 --evaluate_at 2000

To get more information, you can use the help function:

python3 -m deeprc.examples.examples_from_paper.cmv_with_implanted_signals -h
LSTM-generated data

Training a binary DeepRC classifier on dataset "0" of category "LSTM-generated data":

python3 -m deeprc.examples.examples_from_paper.lstm_generated 0
Simulated immunosequencing data

Training a binary DeepRC classifier on dataset "0" of category "simulated immunosequencing data":

python3 -m deeprc.examples.examples_from_paper.simulated 0

Warning: Filesize to download is ~20GB per dataset!

Real-world data

Training a binary DeepRC classifier on dataset "real-world data":

python3 -m deeprc.examples.examples_from_paper.cmv

Training DeepRC on a custom dataset

You can train DeepRC on custom text-based datasets, such as the small example dataset deeprc/datasets/example_dataset. Specifications of the supported dataset formats are give here: deeprc/datasets/README.md.

You can change the dataset directory and task description in the examples listed below and start training a DeepRC model on your task:

Training a binary DeepRC classifier on a small random example dataset using 1D CNN sequence embedding:
python3 -m deeprc.examples.example_single_task_cnn.py
Training DeepRC in a multi-task setting on a small random example dataset using 1D CNN sequence embedding:
python3 -m deeprc.examples.example_multitask_cnn.py
Training DeepRC in a multi-task setting on a small random example dataset using LSTM sequence embedding:
python3 -m deeprc.examples.example_multitask_lstm.py

Datasets

The datasets will be automatically downloaded when running the examples from section "Training DeepRC on pre-defined datasets". You can also manually download the datasets here: https://ml.jku.at/research/DeepRC/datasets/. Please see our paper for descriptions of the datasets.

Structure

deeprc
      |--datasets : stores datasets
      |   |--example_dataset : Small example dataset
      |   |--README.md : Information on supported dataset formats
      |   |--splits_used_in_paper : Dataset splits as used in paper
      |--deeprc : DeepRC implementation
      |   |--architectures.py : DeepRC network architecture
      |   |--dataset_converters.py : Converter for text-based datasets
      |   |--dataset_readers.py : Tools for reading datasets
      |   |--predefined_datasets.py : Pre-defined datasets from paper
      |   |--task_definitions.py : Tools for defining the task to train DeepRC on
      |   |--training.py : Tools for training DeepRC model
      |--examples : DeepRC examples
      |   |--examples_from_paper : Examples on datasets used in paper
      |--neurips_poster.pdf : Poster from NeurIPS2020 poster session

Note

I'm currently cleaning up and uploading the code for the paper. There might be (and probably are) some bugs which will be fixed soon. If you need help with running DeepRC in the meantime, feel free to write me an email (widrich at ml.jku.at).

Best wishes,

Michael

Requirements

I relaxed the package versions to untested versions now. Please see the list below for the tested package versions and let me know if some higher package version fails.