Skip to content

Commit

Permalink
rna004 inbuilt
Browse files Browse the repository at this point in the history
  • Loading branch information
hasindu2008 committed Jan 18, 2024
1 parent 2a29d1f commit c6ce1ea
Show file tree
Hide file tree
Showing 7 changed files with 262,205 additions and 24 deletions.
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# f5c

An optimised re-implementation of the *index*, *call-methylation* and *eventalign* modules in [Nanopolish](https://github.com/jts/nanopolish). Given a set of basecalled Nanopore reads and the raw signals, *f5c call-methylation* detects the methylated cytosine and *f5c eventalign* aligns raw nanopore signals (events) to the reference k-mers. *f5c* can optionally utilise NVIDIA graphics cards for acceleration. **f5c v1.2 onwards support the latest nanopore R10.4.1 chemistry (make sure to specify --pore r10 if input is FAST5, autodetected for S/BLOW5 input)**. For best performance and easy usability, it is recommended to use f5c on [BLOW5 format](https://www.nature.com/articles/s41587-021-01147-4). Use [slow5tools](https://github.com/hasindu2008/slow5tools) for FAST5->BLOW5 conversion and [blue-crab](https://github.com/Psy-Fer/blue-crab) for POD5->BLOW5 conversion.
An optimised re-implementation of the *index*, *call-methylation* and *eventalign* modules in [Nanopolish](https://github.com/jts/nanopolish). Given a set of basecalled Nanopore reads and the raw signals, *f5c call-methylation* detects the methylated cytosine and *f5c eventalign* aligns raw nanopore signals (events) to the reference k-mers. *f5c* can optionally utilise NVIDIA graphics cards for acceleration. For best performance and easy usability, it is recommended to use f5c on [BLOW5 format](https://www.nature.com/articles/s41587-021-01147-4). Use [slow5tools](https://github.com/hasindu2008/slow5tools) for FAST5->BLOW5 conversion and [blue-crab](https://github.com/Psy-Fer/blue-crab) for POD5->BLOW5 conversion.

First, the reads have to be indexed using `f5c index`. Then, invoke `f5c call-methylation` to detect methylated cytosine bases. Finally, you may use `f5c meth-freq` to obtain methylation frequencies. Alternatively, invoke `f5c eventalign` to perform event alignment. The results are almost the same as from nanopolish except a few differences due to floating point approximations.

- **f5c v1.2 onwards support nanopore R10.4.1 chemistry (must specify --pore r10 if FAST5 input, autodetected for S/BLOW5 input)**.
- **f5c v1.4 onwards support nanopore RNA004 chemistry (make specify --pore rna004 if FAST5 input, autodetected for S/BLOW5 input)**.

*Full Documentation* : [https://hasindu2008.github.io/f5c/docs/overview](https://hasindu2008.github.io/f5c/docs/overview)<br/>
*Latest release* : [https://github.com/hasindu2008/f5c/releases/latest](https://github.com/hasindu2008/f5c/releases/latest)<br/>
*Pre-print* : [https://doi.org/10.1101/756122](https://www.biorxiv.org/content/10.1101/756122v1)<br/>
Expand Down Expand Up @@ -115,7 +118,7 @@ f5c index --slow5 [slow5_file] [read.fastq|fasta] # for S/BLOW5
# for S/BLOW5 (R9.4 or R10.4 DNA data)
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --slow5 [slow5_file] > [meth.tsv]
# for FAST5, R9.4 DNA data
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] > [meth.tsv]
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] > [meth.tsv]
# for FAST5, R10.4 DNA data
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --pore r10 > [meth.tsv]
# methylation frequency
Expand All @@ -131,6 +134,8 @@ f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] > [even
f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --rna > [events.tsv]
# for FAST5 (R10.4 DNA data)
f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --pore r10 > [events.tsv]
# for FAST5 (RNA004 RNA data)
f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --pore rna004 --rna > [events.tsv]
```

Visit the [man page](https://hasindu2008.github.io/f5c/docs/commands) for all the commands and options.
Expand Down
33 changes: 23 additions & 10 deletions src/f5c.c
Original file line number Diff line number Diff line change
Expand Up @@ -118,16 +118,18 @@ int8_t drna_detect(slow5_file_t *sp){
int8_t pore_detect(slow5_file_t *sp){

const slow5_hdr_t* hdr = sp->header;
int8_t r10 = 0;
int8_t pore = 0;
char *kit =slow5_hdr_get("sequencing_kit", 0, hdr);
if(kit==NULL){
WARNING("%s","sequencing_kit not found in SLOW5 header. Assuming R9.4.1");
return 0;
}
if (strstr(kit,"114")==NULL){
r10 = 0;
if (strstr(kit,"114")!=NULL){
pore = 1;
} else if (strstr(kit,"rna004")!=NULL){
pore = 2;
} else {
r10 = 1;
pore = 0;
}

for(uint32_t i=1; i < hdr->num_read_groups; i++){
Expand All @@ -136,7 +138,7 @@ int8_t pore_detect(slow5_file_t *sp){
WARNING("sequencing_kit type mismatch: %s != %s in read group %d. Defaulted to %s", curr, kit, i, kit);
}
}
return r10;
return pore;
}

/* initialise the core data structure */
Expand Down Expand Up @@ -255,9 +257,20 @@ core_t* init_core(const char* bamfilename, const char* fastafile,
}
}

if(opt.pore==NULL && pore_detect(core->sf)){
opt.flag |= F5C_R10;
if(opt.verbosity > 1) fprintf(stderr,"Detected R10 data. --pore r10 was set automatically.\n");

if(opt.pore==NULL){
int8_t pore = pore_detect(core->sf);
if(pore){
opt.flag |= F5C_R10;
if(opt.verbosity > 1) {
if (pore == 1) fprintf(stderr,"Detected R10 data. --pore r10 was set automatically.\n");
if (pore == 2) fprintf(stderr,"Detected RNA004 data. --pore rna004 was set automatically.\n");
if (pore == 1 && (opt.flag & F5C_RNA)){
ERROR("%s","R10 RNA data does not exist! But header header indicates that the data is R10 RNA.");
exit(EXIT_FAILURE);
}
}
}
}
}

Expand Down Expand Up @@ -287,8 +300,8 @@ core_t* init_core(const char* bamfilename, const char* fastafile,
} else {
if(opt.flag & F5C_RNA){
if(opt.flag & F5C_R10){
ERROR("%s","RNA R10 model is not available");
exit(EXIT_FAILURE);
INFO("%s","builtin RNA004 nucleotide model loaded");
kmer_size=set_model(core->model, MODEL_ID_RNA_RNA004_NUCLEOTIDE);
} else {
INFO("%s","builtin RNA R9 nucleotide model loaded");
kmer_size=set_model(core->model, MODEL_ID_RNA_R9_NUCLEOTIDE);
Expand Down
1 change: 1 addition & 0 deletions src/f5cmisc.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#define MODEL_ID_RNA_R9_NUCLEOTIDE 3
#define MODEL_ID_DNA_R10_NUCLEOTIDE 4
#define MODEL_ID_DNA_R10_CPG 5
#define MODEL_ID_RNA_RNA004_NUCLEOTIDE 6


// Flags to modify the behaviour of the HMM
Expand Down
Loading

0 comments on commit c6ce1ea

Please sign in to comment.