Skip to content

pinellolab/crispr-bean

Repository files navigation

crispr-bean

PyPI pyversions PyPI version Test Documentation License: AGPL v3

bean improves CRISPR pooled screen analysis by 1) unconfounding variable per-guide editing outcome by considering genotypic outcome from reporter sequence and 2) through accurate modeling of screen procedure.

Reporter construct

Overview

bean supports end-to-end analysis of pooled sorting screens, with or without reporter.

dag_bean_v2.svg

bean subcommands include the following: Click on the links to see the full documentation.

  1. count, count-samples: Base-editing-aware mapping of guide, optionally with reporter from .fastq files.
    • create-screen creates minimal ReporterScreen object from flat gRNA count file. Note that this way, allele counts are not included and many functionalities involving allele and edit counts are not supported.
  2. profile: Profile editing preferences of your editor.
  3. qc: Quality control report and filtering out / masking of aberrant sample and guides
  4. filter: Filter reporter alleles; essential for tiling mode that allows for all alleles generated from gRNA.
  5. run: Quantify targeted variants' effect sizes from screen data. See more about the model & output
  • Screen data is saved as ReporterScreen object in the pipeline. BEAN stores mapped gRNA and allele counts in ReporterScreen object which is compatible with AnnData.

Installation

First install PyTorch. Then download from PyPI:

pip install crispr-bean

For the latest version of bean (and for the test files in tests/data), install from Github:

git clone https://github.com/pinellolab/crispr-bean.git
cd crispr-bean
pip install -e .

Documentaton

See the documentation for tutorials and API references.

Tutorials

Library design Selection Reporter Tutorial link
GWAS variant library FACS sorting Yes/No GWAS variant screen
Coding sequence tiling libarary FACS sorting Yes/No Coding sequence tiling screen
GWAS variant library Survival / Proliferation Yes/No GWAS variant screen
Coding sequence tiling libarary Survival / Proliferation Yes/No Coding sequence tiling screen
Perturbation library without reporter FACS sorting No No reporter screen
Integration of disjoint libraries Any Any Feeding custom prior

Also see notebook that visualizes screen analysis result here.

Library design: variant or tiling?

The bean filter and bean run steps depend on the type of gRNA library design, where BEAN supports two modes of running. variant library design

  1. variant library: Several gRNAs tile each of the targeted variants. Only the editing rate of the target variant is considered and the bystander effects are ignored.

    • ➕ Increase power for your target variant, as the signal is not distributed across likely no-effect bystanders.
    • ➖ Ignores potential bystander effect
    • ✔️ Suitable for noncoding GWAS variant screens.
  2. tiling library: gRNA densely tiles a long region (e.g. gene(s), exon(s), coding sequence(s)). Bystander edits are considered to obtain alleles with significant fractions. Edited alleles can be "translated" to output coding variants.

    • ➕ Considers bystander effect
    • ➖ If the library results in alleles that are not diverse enough across gRNAs, signal will likely be diluted to all variants in that alleles. (ex. Allele "GGGGG" with a single gRNA score will distribute scores across 5 G's.)
    • ✔️ Suitable for coding variant screens with tiling design.

Using BEAN as Python module

import bean as be
cdata = be.read_h5ad("bean_counts_sample.h5ad")

Python package bean supports multiple data wrangling functionalities for ReporterScreen objects. See the ReporterScreen API tutorial for more detail.

Run time

  • Installation takes 14.4 mins after pytorch installation with pytorch in Dell XPS 13 Ubuntu WSL.
  • bean run takes 4.6 mins with --scale-by-acc tag in Dell XPS 13 Ubuntu WSL for variant screen dataset with 3455 guides and 6 replicates with 4 sorting bins.
  • Full pipeline takes 90.1s in GitHub Action for toy dataset of 2 replicates and 30 guides.

Contributing

See CHANGELOG for recent updates. If you have questions or feature request, please open an issue. Please feel free to send a pull request.

Citation

If you have used BEAN for your analysis, please cite:
Ryu, J., Barkal, S., Yu, T. et al. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet (2024). https://doi.org/10.1038/s41588-024-01726-6