forked from yana-safonova/ig_repertoire_constructor
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbarcodedIgReC_manual.html
304 lines (245 loc) · 11.2 KB
/
barcodedIgReC_manual.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
<head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>BarcodedIgReC Manual</title>
<style type="text/css">
.code {
background-color: lightgray;
}
</style>
<style>
</style>
</head>
<h1>BarcodedIgReC manual</h1>
1. <a href="#intro">What is BarcodedIgReC?</a><br>
2. <a href="#install">Installation</a><br>
2.1. <a href="#test_datasets">Verifying your installation</a><br>
3. <a href="#usage">BarcodedIgReC usage</a><br>
3.1. <a href="#basic_opts">Basic options</a><br>
3.2. <a href="#advanced_opts">Advanced options</a><br>
3.3. <a href="#run_examples">Examples</a><br>
3.4. <a href="#output">Output files</a><br>
4. <a href="#citation">Citation</a><br>
5. <a href="#feedback">Feedback and bug reports</a><br>
<!--- ----------------- -->
<h2 id="intro">1. What is BarcodedIgReC?</h2>
<a href="barcodedIgReC.html">BarcodedIgReC</a> is a modification of <a href="index.html">IgReC</a> full-length antibody repertoire construction tool for barcoded datasets.
BarcodedIgReC pipeline is shown below:<br>
<p align = center>
<img src="barigrec_figures/BarcodedIgReC_pipeline.png" alt="BarcodedIgReC_pipeline" width = 70%>
</p>
<h4>Input:</h4>
<p align="justify">
BarcodedIgReC takes as an input demultiplexed paired-end or single reads with unique molecular identifiers (UMIs).
<b>Please note that IgRepertoireConstructor constructs full-length repertoire and
expects that input reads cover variable region of antibody/TCR.</b>
</p>
<h4>Output:</h4>
<p align = justify>
BarcodedIgReC corrects sequencing and amplification errors and groups together reads corresponding to identical antibodies.
Thus, constructed repertoire is a set of <i>antibody clusters</i> characterized by
<i>sequence</i>, <i>read multiplicity</i> and <i>molecule multiplicity</i>.
While read multiplicity is the number of reads in an antibody cluster,
molecule multiplicity is an estimate to the number of RNA molecules related to the cluster.
BarcodedIgReC provides user with the following information about constructed repertoire:
</p>
<ul>
<li>antibody clusters: antibody sequences with read multiplicities,</li>
<li>antibody clusters: antibody sequences with molecule multiplicities,</li>
<li>read-cluster map: information about antibody clusters contents.</li>
<!--<li>highly abundant antibody clusters,</li>-->
<!--<li>super reads: groups of identical input reads with high coverage.</li>-->
</ul>
<h4>Stages:</h4>
BarcodedIgReC pipeline consists of the following steps:
<ol>
<li><b>VJ Finder</b>: cleaning input reads using alignment against Ig germline genes</li>
<li><b>Barcode clustering</b>: correcting errors in reads sharing a barcode,
correcting barcode errors, handling issues with identical and close barcodes assigned to unrelated molecules.
Also chimeric reads are discarded at this step.
As a result, we group all reads by original molecules.
</li>
<li><b>IgReC</b>: grouping very close molecules, thus correcting minor remaining errors.
The output is formed by this step as well.
</li>
</ol>
<!--<p align="justify">-->
<!--You can find details of IgReC algorithm in <a href = "http://bioinformatics.oxfordjournals.org/content/31/12/i53.long">our paper</a>.-->
<!--</p>-->
<!--- ---------------------------------------------------------------- -->
<!--- ---------------------------------------------------------------- -->
<a id="install"></a>
<h2>2. Installation</h2>
BarcodedIgReC has the following dependencies:
<ul>
<li>64-bit Linux system</li>
<li>g++ (version 4.7 or higher)</li>
<li>cmake (version 2.8.8 or higher)</li>
<li>Python 2 (version 2.7 or higher), including:</li>
<ul>
<li><a href = "http://biopython.org/wiki/Download">BioPython</a></li>
<li><a href = "http://matplotlib.org/users/installing.html">Matplotlib</a></li>
<li><a href = "http://www.numpy.org/">NumPy</a></li>
<li><a href = "http://www.scipy.org/install.html">SciPy</a></li>
</ul>
</ul>
To install BarcodedIgReC, type:
<pre class="code">
<code>
./prepare_cfg
</code>
</pre>
and:
<pre class="code">
<code>
make
</code>
</pre>
<a id="test_datasets"></a>
<h3>2.1. Verifying your installation</h3>
For testing purposes, BarcodedIgReC comes with a toy dataset. <br><br>
To try BarcodedIgReC on the test data set, run:
<pre class="code"><code>
./barcoded_igrec.py --test
</code>
</pre>
If the installation of BarcodedIgReC is successful, you will find the following information at the end of the log:
<pre class="code">
<code>
Thank you for using BarcodedIgReC!
Log was written to barigrec_test/igrec.log
</code>
</pre>
<!--- ---------------------------------------------------------------- -->
<!--- ---------------------------------------------------------------- -->
<a id="usage"></a>
<h2>3. BarcodedIgReC usage</h2>
<p>
BarcodedIgReC takes as an input demultiplexed barcoded Illumina reads covering variable region of antibody and constructs repertoire
in <a href = "ig_repertoire_constructor_manual.html#repertoire_files">CLUSTERS.FA and RCM format</a>.
</p>
To run BarcodedIgReC, type:
<pre class="code">
<code>
./barcoded_igrec.py [options] -s <single_reads.fastq> -o <output_dir>
</code>
</pre>
OR
<pre class="code">
<code>
./barcoded_igrec.py [options] -1 <left_reads.fastq> -2 <right_reads.fastq> -o <output_dir>
</code>
</pre>
<!--- --------------------- -->
<a id="basic_opts"></a>
<h3>3.1. Basic options:</h3>
<code>-s <single_reads.fastq></code><br>
FASTQ file with single Illumina reads (required).
<br><br>
<code>-1 <left_reads.fastq> -2 <right_reads.fastq></code><br>
FASTQ files with paired-end Illumina reads (required).
<br><br>
<code>-o / --output <output_dir></code><br>
Output directory (required).
<br><br>
<code>-t / --threads <int></code><br>
The number of parallel threads. The default value is <code>16</code>.
<br><br>
<code>--test</code><br>
Running on the toy test dataset. Command line corresponding to the test run is equivalent to the following:
<pre class = "code">
<code>
./barcoded_igrec.py -s test_dataset/barcodedIgReC_test.fasta -l all -o barigrec_test
</code>
</pre>
<code>--loci / -l <str></code><br>
Immunological loci to align input reads and discard reads with low score (required). <br>
Available values are <code>IGH</code> / <code>IGL</code> / <code>IGK</code> / <code>IG</code> (for all BCRs) /
<code>TRA</code> / <code>TRB</code> / <code>TRG</code> / <code>TRD</code> / <code>TR</code> (for all TCRs) or <code>all</code>.
This is a required parameter.
<br><br>
<code>--help</code><br>
Printing help.
<br><br>
<!--- --------------------- -->
<a id="advanced_opts"></a>
<h3>3.2. Advanced options:</h3>
<code>--organism <str></code><br>
Organism for which the germline is taken.
Available values are <code>human</code>, <code>mouse</code>, <code>pig</code>,
<code>rabbit</code>, <code>rat</code> and <code>rhesus_monkey</code>.
The default value is <code>human</code>.
<br><br>
<code>--igrec-tau <int></code><br>
Maximal allowed number of mismatches between two barcode cluster consensuses corresponding to identical antibodies.
The default (and recommended) value is 2.
This value allows barcode cluster consensuses to contain a single error.
Higher values can reduce barcoding advantage.
Lower values may produce better results for large clusters, not gluing close sequences.
At the same time, small clusters can suffer from undercorrection in such case.
<br><br>
<code>--clustering-thr <int></code><br>
Maximal allowed distance between reads sharing a barcode to put them into one cluster.
The default value is 20.
Our analysis shows that this value allows both not to put unrelated antibodies into the same cluster and to correct all the amplification errors.
You can increase this value for overamplified datasets to ensure better error correction.
You can decrease this value for datasets with high clonality to better distinguish close antibodies.
<a id="run_examples"></a>
<h3>3.3. Examples</h3>
To construct antibody repertoire from single reads <code>reads.fastq</code>, type:
<pre class="code">
<code>
./barcoded_igrec.py -s reads.fastq -o output_dir -l all
</code>
</pre>
<!--- --------------------- -->
<a id="output"></a>
<h3>3.4. Output files</h3>
BarcodedIgReC creates working directory (which name was specified using option <code>-o</code>)
and outputs the following files there:
<ul>
<li>Final repertoire files:</li>
<ul>
<li><b>final_repertoire/final_repertoire.fa</b> — CLUSTERS.FASTA file for all antibody clusters of the constructed repertoire with cluster size in terms of reads.
(details in <a href= "ig_repertoire_constructor_manual.html#repertoire_files">Antibody repertoire representation</a>).</li>
<li><b>final_repertoire/final_repertoire_umi.fa.gz</b> — CLUSTERS.FASTA file for all antibody clusters of the constructed repertoire with cluster size in terms of RNA molecules.
<li><b>final_repertoire/final_repertoire.rcm</b> — RCM file for the constructed repertoire
(details in <a href = "ig_repertoire_constructor_manual.html#repertoire_files">Antibody repertoire representation</a>).</li>
</ul><br>
<li>VJ finder output:</li>
<ul>
<li>
<b>vj_finder/cleaned_reads.fa</b> — FASTA file with cleaned reads constructed at the VJ Finder stage.
Cleaned reads have forward direction (from V to J),
contain V and J gene segments and are cropped by the left bound of V gene segment.
</li>
<li>
<b>vj_finder/filtered_reads.fa</b> — FASTA file with filtered reads.
Filtered reads have bad alignment to Ig germline gene segments and are likely to <re></re>present contaminations.
</li>
<li>
<b>vj_finder/alignment_info.csv</b> — CSV file containing information about alignment of cleaned reads to
V and J gene segments.
Details of <b>alignment_info.csv</b> format are given in <a href="ig_repertoire_constructor_manual.html#alignment_info">Alignment Info file format</a>.
</li>
</ul><br>
<li><b>igrec.log</b> — full log of BarcodedIgReC run.</li>
</ul>
<br>
<!--- -------------------------------------------------------------------- --->
<h2 id = "citation">4. Citations</h2>
If you use BarcodedIgReC in your research, please refer to
<p align = "justify">
Alexander Shlemov, Sergey Bankevich, Andrey Bzikadze,
Dmitriy M. Chudakov,
Yana Safonova,
and Pavel A. Pevzner.
Reconstructing antibody repertoires from error-prone immunosequencing datasets <i>(submitted)</i>
</p>
<a id="feedback"></a>
<h2>5. Feedback and bug reports</h2>
Your comments, bug reports, and suggestions are very welcome.
They will help us to further improve BarcodedIgReC.
<br><br>
If you have any trouble running BarcodedIgReC, please send us the log file from the output directory.
<br><br>
Address for communications: <a href="mailto:igtools_support@googlegroups.com">igtools_support@googlegroups.com</a>.
</body></html>