Skip to content

Commit

Permalink
Merge pull request #155 from hasindu2008/rna004
Browse files Browse the repository at this point in the history
Rna004
  • Loading branch information
hasindu2008 committed Feb 17, 2024
2 parents 2a29d1f + 6c4287c commit f864294
Show file tree
Hide file tree
Showing 16 changed files with 262,236 additions and 52 deletions.
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# f5c

An optimised re-implementation of the *index*, *call-methylation* and *eventalign* modules in [Nanopolish](https://github.com/jts/nanopolish). Given a set of basecalled Nanopore reads and the raw signals, *f5c call-methylation* detects the methylated cytosine and *f5c eventalign* aligns raw nanopore signals (events) to the reference k-mers. *f5c* can optionally utilise NVIDIA graphics cards for acceleration. **f5c v1.2 onwards support the latest nanopore R10.4.1 chemistry (make sure to specify --pore r10 if input is FAST5, autodetected for S/BLOW5 input)**. For best performance and easy usability, it is recommended to use f5c on [BLOW5 format](https://www.nature.com/articles/s41587-021-01147-4). Use [slow5tools](https://github.com/hasindu2008/slow5tools) for FAST5->BLOW5 conversion and [blue-crab](https://github.com/Psy-Fer/blue-crab) for POD5->BLOW5 conversion.
An optimised re-implementation of the *index*, *call-methylation* and *eventalign* modules in [Nanopolish](https://github.com/jts/nanopolish). Given a set of basecalled Nanopore reads and the raw signals, *f5c call-methylation* detects the methylated cytosine and *f5c eventalign* aligns raw nanopore signals (events) to the reference k-mers. *f5c* can optionally utilise NVIDIA graphics cards for acceleration. For best performance and easy usability, it is recommended to use f5c on [BLOW5 format](https://www.nature.com/articles/s41587-021-01147-4). Use [slow5tools](https://github.com/hasindu2008/slow5tools) for FAST5->BLOW5 conversion and [blue-crab](https://github.com/Psy-Fer/blue-crab) for POD5->BLOW5 conversion.

First, the reads have to be indexed using `f5c index`. Then, invoke `f5c call-methylation` to detect methylated cytosine bases. Finally, you may use `f5c meth-freq` to obtain methylation frequencies. Alternatively, invoke `f5c eventalign` to perform event alignment. The results are almost the same as from nanopolish except a few differences due to floating point approximations.

- **f5c v1.2 onwards support nanopore R10.4.1 chemistry (must specify --pore r10 if FAST5 input, autodetected for S/BLOW5 input)**.
- **f5c v1.4 onwards support nanopore RNA004 chemistry (make specify --pore rna004 if FAST5 input, autodetected for S/BLOW5 input)**.

*Full Documentation* : [https://hasindu2008.github.io/f5c/docs/overview](https://hasindu2008.github.io/f5c/docs/overview)<br/>
*Latest release* : [https://github.com/hasindu2008/f5c/releases/latest](https://github.com/hasindu2008/f5c/releases/latest)<br/>
*Pre-print* : [https://doi.org/10.1101/756122](https://www.biorxiv.org/content/10.1101/756122v1)<br/>
Expand Down Expand Up @@ -115,7 +118,7 @@ f5c index --slow5 [slow5_file] [read.fastq|fasta] # for S/BLOW5
# for S/BLOW5 (R9.4 or R10.4 DNA data)
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --slow5 [slow5_file] > [meth.tsv]
# for FAST5, R9.4 DNA data
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] > [meth.tsv]
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] > [meth.tsv]
# for FAST5, R10.4 DNA data
f5c call-methylation -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --pore r10 > [meth.tsv]
# methylation frequency
Expand All @@ -131,6 +134,8 @@ f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] > [even
f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --rna > [events.tsv]
# for FAST5 (R10.4 DNA data)
f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --pore r10 > [events.tsv]
# for FAST5 (RNA004 RNA data)
f5c eventalign -b [reads.sorted.bam] -g [ref.fa] -r [reads.fastq|fasta] --pore rna004 --rna > [events.tsv]
```

Visit the [man page](https://hasindu2008.github.io/f5c/docs/commands) for all the commands and options.
Expand All @@ -140,7 +145,7 @@ Visit the [man page](https://hasindu2008.github.io/f5c/docs/commands) for all th
Follow the same steps as in [Nanopolish tutorial](https://nanopolish.readthedocs.io/en/latest/quickstart_call_methylation.html) while replacing `nanopolish` with `f5c`. If you only want to perform a quick test of f5c :
```sh
#download and extract the dataset including sorted alignments
wget -O f5c_na12878_test.tgz "https://f5c.page.link/f5c_na12878_test"
wget -O f5c_na12878_test.tgz "https://f5c.bioinf.science/f5c_na12878_test"
tar xf f5c_na12878_test.tgz

###### Using S/BLOW5 as input (recommended) ######
Expand Down
2 changes: 1 addition & 1 deletion scripts/install-vbz.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ print() {
echo -e "${GREEN}$1${NC}" >&2
}

MANUAL_LINK="https://f5c.page.link/troubleshoot"
MANUAL_LINK="https://f5c.bioinf.science/troubleshoot"

uname -o || die "Could not determine the O/S. See ${MANUAL_LINK}"
uname -m || die "Could not determine the architecture. See ${MANUAL_LINK}"
Expand Down
12 changes: 6 additions & 6 deletions scripts/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ else
fi
# execution mode (valgrind/gdb/cpu/cuda/echo)
mode=
testset_url="https://f5c.page.link/f5c_ecoli_2kb_region_test"
fallback_url="https://f5c.page.link/f5c_ecoli_2kb_region_test_fallback"
testset_url="https://f5c.bioinf.science/f5c_ecoli_2kb_region_test"
fallback_url="https://f5c.bioinf.science/f5c_ecoli_2kb_region_test_fallback"

# download test set given url
#
Expand Down Expand Up @@ -131,17 +131,17 @@ do
bamfile=${testdir}/reads.sorted.bam
ref=${testdir}/humangenome.fa
reads=${testdir}/reads.fastq
testset_url="https://f5c.page.link/f5c_na12878_test"
fallback_url="https://f5c.page.link/f5c_na12878_test_fallback";;
testset_url="https://f5c.bioinf.science/f5c_na12878_test"
fallback_url="https://f5c.bioinf.science/f5c_na12878_test_fallback";;
f) testdir=test/hg2_lsk114_reads_1000
reads=${testdir}/PGXX22394_reads_1000_6.4.2_sup.fastq
slow5=${testdir}/PGXX22394_reads_1000.blow5
ref=test/chr22_meth_example/humangenome.fa
bamfile=${testdir}/PGXX22394_reads_1000_6.4.2_sup.bam
testset_url="https://f5c.page.link/hg2_lsk114_reads_1000";;
testset_url="https://f5c.bioinf.science/hg2_lsk114_reads_1000";;
K) batchsize="$OPTARG";;
B) max_bases="$OPTARG";;
d) download_test_set "https://f5c.page.link/f5c_na12878_test" "https://f5c.page.link/f5c_na12878_test_fallback"
d) download_test_set "https://f5c.bioinf.science/f5c_na12878_test" "https://f5c.bioinf.science/f5c_na12878_test_fallback"
exit 0;;
h) help_msg
exit 0;;
Expand Down
2 changes: 1 addition & 1 deletion src/error.h
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ static inline void malloc_chk(void* ret, const char* func, const char* file,
"[%s::ERROR]\033[1;31m Failed to allocate memory : "
"%s.\033[0m\n[%s::DEBUG]\033[1;35m Error occured at %s:%d. Try with a small batchsize (-K and/or -B options),"
"fewer threads (-t) or skip ultra-long reads (--skip-ultra) to reduce the peak memory."
"See https://f5c.page.link/troubleshoot for details.\033[0m\n\n",
"See https://f5c.bioinf.science/troubleshoot for details.\033[0m\n\n",
func, strerror(errno), func, file, line);
exit(EXIT_FAILURE);
}
Expand Down
33 changes: 23 additions & 10 deletions src/f5c.c
Original file line number Diff line number Diff line change
Expand Up @@ -118,16 +118,18 @@ int8_t drna_detect(slow5_file_t *sp){
int8_t pore_detect(slow5_file_t *sp){

const slow5_hdr_t* hdr = sp->header;
int8_t r10 = 0;
int8_t pore = OPT_PORE_R9;
char *kit =slow5_hdr_get("sequencing_kit", 0, hdr);
if(kit==NULL){
WARNING("%s","sequencing_kit not found in SLOW5 header. Assuming R9.4.1");
return 0;
}
if (strstr(kit,"114")==NULL){
r10 = 0;
if (strstr(kit,"114")!=NULL){
pore = OPT_PORE_R10;
} else if (strstr(kit,"rna004")!=NULL){
pore = OPT_PORE_RNA004;
} else {
r10 = 1;
pore = OPT_PORE_R9;
}

for(uint32_t i=1; i < hdr->num_read_groups; i++){
Expand All @@ -136,7 +138,7 @@ int8_t pore_detect(slow5_file_t *sp){
WARNING("sequencing_kit type mismatch: %s != %s in read group %d. Defaulted to %s", curr, kit, i, kit);
}
}
return r10;
return pore;
}

/* initialise the core data structure */
Expand Down Expand Up @@ -255,9 +257,20 @@ core_t* init_core(const char* bamfilename, const char* fastafile,
}
}

if(opt.pore==NULL && pore_detect(core->sf)){
opt.flag |= F5C_R10;
if(opt.verbosity > 1) fprintf(stderr,"Detected R10 data. --pore r10 was set automatically.\n");

if(opt.pore==NULL){
int8_t pore = pore_detect(core->sf);
if(pore){
opt.flag |= F5C_R10;
if(opt.verbosity > 1) {
if (pore == OPT_PORE_R10) fprintf(stderr,"Detected R10 data. --pore r10 was set automatically.\n");
if (pore == OPT_PORE_RNA004) fprintf(stderr,"Detected RNA004 data. --pore rna004 was set automatically.\n");
if (pore == OPT_PORE_R10 && (opt.flag & F5C_RNA)){
ERROR("%s","R10 RNA data does not exist! But header header indicates that the data is R10 RNA.");
exit(EXIT_FAILURE);
}
}
}
}
}

Expand Down Expand Up @@ -287,8 +300,8 @@ core_t* init_core(const char* bamfilename, const char* fastafile,
} else {
if(opt.flag & F5C_RNA){
if(opt.flag & F5C_R10){
ERROR("%s","RNA R10 model is not available");
exit(EXIT_FAILURE);
INFO("%s","builtin RNA004 nucleotide model loaded");
kmer_size=set_model(core->model, MODEL_ID_RNA_RNA004_NUCLEOTIDE);
} else {
INFO("%s","builtin RNA R9 nucleotide model loaded");
kmer_size=set_model(core->model, MODEL_ID_RNA_R9_NUCLEOTIDE);
Expand Down
5 changes: 5 additions & 0 deletions src/f5cmisc.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,12 @@
#define MODEL_ID_RNA_R9_NUCLEOTIDE 3
#define MODEL_ID_DNA_R10_NUCLEOTIDE 4
#define MODEL_ID_DNA_R10_CPG 5
#define MODEL_ID_RNA_RNA004_NUCLEOTIDE 6

///opts for autodetect
#define OPT_PORE_R9 0
#define OPT_PORE_R10 1
#define OPT_PORE_RNA004 2

// Flags to modify the behaviour of the HMM
enum HMMAlignmentFlags
Expand Down
2 changes: 1 addition & 1 deletion src/fast5lite.h
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ static inline int32_t fast5_read_multi_fast5(fast5_file_t fh, signal_t* sig, std
if (status < 0) {
free(sig->rawptr);
if(fast5_is_vbz_compressed(fh, read_id) == 1) {
ERROR("%s","The fast5 file is compressed with VBZ but the required plugin is not loaded. See https://f5c.page.link/troubleshoot for instructions.\n");
ERROR("%s","The fast5 file is compressed with VBZ but the required plugin is not loaded. See https://f5c.bioinf.science/troubleshoot for instructions.\n");
}
WARNING("Failed to read raw data from dataset %s.", signal_path);
H5Sclose(space);
Expand Down
2 changes: 1 addition & 1 deletion src/freq.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ static const char usage[] = "Usage: %s [OPTIONS] -i methcalls.tsv\n"
" -s split groups\n"
" -h help\n"
" --version print version\n"
"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.page.link/man).\n"
"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.bioinf.science/man).\n"
;

struct site_stats {
Expand Down
2 changes: 1 addition & 1 deletion src/freq_merge.c
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ static const char *MAP_REDUCE_USAGE_MESSAGE =
" -o FILE output file. Write to stdout if not specified\n"
" -h help\n"
" --version print version\n"
"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.page.link/man)."
"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.bioinf.science/man)."
"\n\n"
;

Expand Down
2 changes: 1 addition & 1 deletion src/index.c
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ static const char *INDEX_USAGE_MESSAGE =
" --skip-slow5-idx do not build the .idx for the slow5 file (useful when a slow5 index is already available)\n"
" --verbose INT verbosity level\n"
" --version print version\n"
"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.page.link/man)."
"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.bioinf.science/man)."
"\n\n"
;

Expand Down
2 changes: 1 addition & 1 deletion src/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ int print_usage(FILE *fp_help){
fprintf(fp_help," eventalign Align nanopore events to reference k-mers (optimised nanopolish eventalign)\n");
fprintf(fp_help," freq-merge Merge calculated methylation frequency tsv files\n");
fprintf(fp_help," resquiggle Align raw signals to basecalled reads\n");
fprintf(fp_help,"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.page.link/man).\n");
fprintf(fp_help,"\nSee the manual page for details (`man ./docs/f5c.1' or https://f5c.bioinf.science/man).\n");
if(fp_help==stderr){
exit(EXIT_FAILURE);
}
Expand Down
Loading

0 comments on commit f864294

Please sign in to comment.