-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
159 lines (108 loc) · 4.71 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
title: "crisprBwa: alignment of gRNA spacer sequences using BWA"
output:
github_document:
toc: true
bibliography: vignettes/references.bib
---
Authors: Jean-Philippe Fortin
Date: July 13, 2022
# Overview of crisprBwa
`crisprBwa` provides two main functions to align short DNA sequences to
a reference genome using the short read aligner BWA-backtrack [@bwa]
and return the alignments as R objects: `runBwa` and `runCrisprBwa`.
It utilizes the Bioconductor package `Rbwa` to access the BWA program
in a platform-independent manner. This means that users do not need to install
BWA prior to using `crisprBwa`.
The latter function (`runCrisprBwa`) is specifically designed
to map and annotate CRISPR guide RNA (gRNA) spacer sequences using
CRISPR nuclease objects and CRISPR genomic arithmetics defined in
the Bioconductor package [crisprBase](https://github.com/crisprVerse/crisprBase).
This enables a fast and accurate on-target and off-target search of
gRNA spacer sequences for virtually any type of CRISPR nucleases.
It also provides an off-target search engine for our main gRNA design package [crisprDesign](https://github.com/crisprVerse/crisprDesign) of the
[crisprVerse](https://github.com/crisprVerse) ecosystem. See the
`addSpacerAlignments` function in `crisprDesign` for more details.
# Installation and getting started
## Software requirements
### OS Requirements
This package is supported for macOS and Linux only.
Package was developed and tested on R version 4.2.1.
## Installation from Bioconductor
`crisprBwa` can be installed from from the Bioconductor devel branch
using the following commands in a fresh R session:
```{r, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version="devel")
BiocManager::install("crisprBwa")
```
# Building a bwa index
To use `runBwa` or `runCrisprBwa`, users need to first build a BWA
genome index. For a given genome, this step has to be done only once.
The `Rbwa` package conveniently provides the function `bwa_build_index`
to build a BWA index from any custom genome from a FASTA file.
As an example, we build a BWA index for a small portion of the human
chromosome 12 (`chr12.fa` file provided in the `crisprBwa` package) and
save the index file as `myIndex` to a temporary directory:
```{r}
library(Rbwa)
fasta <- system.file(package="crisprBwa", "example/chr12.fa")
outdir <- tempdir()
index <- file.path(outdir, "chr12")
Rbwa::bwa_build_index(fasta,
index_prefix=index)
```
To learn how to create a BWA index for a complete genome or transcriptome,
please visit our [tutorial page](https://github.com/crisprVerse/Tutorials/tree/master/Building_Genome_Indices).
# Alignment using `runCrisprBwa`
As an example, we align 5 spacer sequences (of length 20bp) to the
custom genome built above, allowing a maximum of 3 mismatches between the
spacer and protospacer sequences.
We specify that the search is for the wildtype Cas9 (SpCas9) nuclease
by providing the `CrisprNuclease` object `SpCas9` available through the
`crisprBase` package. The argument `canonical=FALSE` specifies that
non-canonical PAM sequences are also considered (NAG and NGA for SpCas9).
The function `getAvailableCrisprNucleases` in `crisprBase` returns a character
vector of available `crisprNuclease` objects found in `crisprBase`.
We also need to provide a `BSgenome` object corresponding to the reference
genome used for alignment to extract protospacer and PAM sequences of the
target sequences.
```{r}
library(crisprBwa)
library(BSgenome.Hsapiens.UCSC.hg38)
data(SpCas9, package="crisprBase")
crisprNuclease <- SpCas9
bsgenome <- BSgenome.Hsapiens.UCSC.hg38
spacers <- c("AGCTGTCCGTGGGGGTCCGC",
"CCCCTGCTGCTGTGCCAGGC",
"ACGAACTGTAAAAGGCTTGG",
"ACGAACTGTAACAGGCTTGG",
"AAGGCCCTCAGAGTAATTAC")
runCrisprBwa(spacers,
bsgenome=bsgenome,
crisprNuclease=crisprNuclease,
n_mismatches=3,
canonical=FALSE,
bwa_index=index)
```
# Applications beyond CRISPR
The function `runBwa` is similar to `runCrisprBwa`,
but does not impose constraints on PAM sequences.
It can be used to search for any short read sequence in a genome.
## Example using RNAi (siRNA design)
Seed-related off-targets caused by mismatch tolerance outside of the
seed region is a well-studied and characterized problem observed in RNA
interference (RNAi) experiments. `runBWa` can be used to map shRNA/siRNA seed
sequences to reference genomes to predict putative off-targets:
```{r, eval=TRUE}
seeds <- c("GTAAGCGGAGTGT", "AACGGGGAGATTG")
runBwa(seeds,
n_mismatches=2,
bwa_index=index)
```
# Reproducibility
```{r}
sessionInfo()
```
# References