Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running into problem with Xpore after merging multiple .fastq and .bam files from same Nanopore run #179

Open
annapopova-scripps opened this issue Apr 12, 2023 · 7 comments

Comments

@annapopova-scripps
Copy link

Dear Xpore developers,

Thank you for the software. It was easy to install and handle , until I decided to merge all 50+ .fastq (and .bam) output files from one nanopore run. Each .fastq roughly has 3-4K reads and on its own produces meaningful Xpore output table, when running in comparative mode. Then, I decided to merge five different .fastq (using cat) and .bam (using samtools merge), and no problem was met at the nanopolish index and nanopolish eventalign steps, consistently I observed that eventalign.txt was five times larger than for a single .fastq. However, Xpore dataprep and Xpore diffmod returned dataprep folder and diffmod.table as if a single .fastq was used.

I am currently limited to adding all individual .fastq files as replicates (below), which is stupid. So, I hope, you may have a good recommendation for the situation.

Anna

data:
IVT:
REP1: ./IVT_dataprep/IVT_1_dataprep
REP2: ./IVT_dataprep/IVT_2_dataprep
REP3: ./IVT_dataprep/IVT_3_dataprep
REP4: ./IVT_dataprep/IVT_4_dataprep
REP5: ./IVT_dataprep/IVT_5_dataprep
REP6: ./IVT_dataprep/IVT_6_dataprep
REP7: ./IVT_dataprep/IVT_7_dataprep
REP8: ./IVT_dataprep/IVT_8_dataprep
REP9: ./IVT_dataprep/IVT_9_dataprep
REP10: ./IVT_dataprep/IVT_10_dataprep

@yuukiiwa
Copy link
Collaborator

Hi @annapopova-scripps,

Do you mind sharing the first 10 lines of the eventalign.txt file, please?

I suspect that some entries in the eventalign.txt file have the transcript/gene version while some do not. xpore strips off the gene/transcript version which may explain why it is the size of a single fastq/bam diffmod.table.

Thanks!

Best wishes,
Yuk Kei

@annapopova-scripps
Copy link
Author

Dear Yuk Kei,

Here are the top 10 lines of the eventalign.txt for the five .fastq files I merged.

Thank you for looking into the issue,

Anna

anna@supernova:~/ONT_files/ONT_runs/nanopolish/Xpore_1/70S$ head 70S_10to14_eventalign.txt
contig position reference_kmer read_index strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx
RRSB-RRNA 9 AGTTT 19 t 1 111.91 8.435 0.00199AGTTT 124.34 7.49 -1.46 65656 65662
RRSB-RRNA 9 AGTTT 19 t 2 128.19 5.800 0.01527AGTTT 124.34 7.49 0.45 65610 65656
RRSB-RRNA 10 GTTTG 19 t 3 79.84 1.955 0.01560GTTTG 78.97 1.97 0.39 65563 65610
RRSB-RRNA 11 TTTGA 19 t 4 89.31 2.765 0.01394TTTGA 89.94 2.65 -0.21 65521 65563
RRSB-RRNA 11 TTTGA 19 t 5 88.84 1.758 0.00764TTTGA 89.94 2.65 -0.37 65498 65521
RRSB-RRNA 11 TTTGA 19 t 6 88.55 1.466 0.00963TTTGA 89.94 2.65 -0.46 65469 65498
RRSB-RRNA 12 TTGAT 19 t 7 105.18 2.992 0.00564TTGAT 108.32 4.49 -0.62 65452 65469
RRSB-RRNA 12 TTGAT 19 t 8 103.04 1.315 0.00365TTGAT 108.32 4.49 -1.04 65441 65452
RRSB-RRNA 12 TTGAT 19 t 9 107.48 2.771 0.00232TTGAT 108.32 4.49 -0.16 65434 65441

@yuukiiwa
Copy link
Collaborator

Hi @annapopova-scripps,

Sorry for the delayed reply!

I supposed if your eventalign.txt is 5 times larger, then your dataprep is also 5 times larger. Still, when you are comparing the two conditions, the sites that differ are the same, which gives you a diffmod.table with a similar size to using a single .fastq file.

Thanks!

Best wishes,
Yuk Kei

@annapopova-scripps
Copy link
Author

Hi Yuk Kei,

My eventalign.txt is 5 times larger (5x), BUT dataprep is only 1x, and clearly in the diffmod.table the coverage never exceeds 1000. In fact, from a single .fastq I should get a coverage of ~1500, and from five .fastq files ~7500 per site.

One thing I noticed is that every data.readcount from a single .fastq is the same:
idx,n_reads
RRSB-RRNA,1001
RRLB-RRNA,1001
Which makes me suspicious that dataprep has a cut off of 1000 per gene. No change every fastq. has same number of reads :)
Do you think this is true?

Best,
Anna

@yuukiiwa
Copy link
Collaborator

yuukiiwa commented Apr 28, 2023

Hi Anna,

The --readcount_max [READCOUNT_MAX] is set to 1,000 reads by default. You can customize it by passing the --readcount_max [READCOUNT_MAX] when running xpore dataprep.

Thanks!

Best wishes,
Yuk Kei

@annapopova-scripps
Copy link
Author

Hi Yuk Kei,

Thank you for the feedback, that fixed the issue and things look consistent now :)

I want to bother you for few more moments:

  1. I noticed that --n_processes option neither in xpore dataprep nor in xpore diffmod when specified does not engage more than one processor in Linux. Did you guys have a chance to evaluate the parallel version of xpore?

  2. I have spent time to play with positions returned in the diffmod.table and got an idea how to use z_score and diff_mod_rate combination to filter out most modification and variant positions in the set of control samples. Still, I am worried that diffmod.table contains 10-30 more entries than var/mod positions, and that positions that are known to be unmodified have distinct mu_umod and mu_mod values, and significant individual mod_rate values (as high as 0.9). To test this, I took an in-vitro transcribed dataset, split it into two datasets and ran xpore in a comparative mode. There were hundreds of entries in the diffmod.table with apparently bimodal distribution of ion current detected by Xpore. This is quite alarming to me, and I am wondering if something can be done to adjust the reference model for that or addressed otherwise to make Xpore more sensitive to real modified positions and quantify stoichiometry.

Sincerely,

Anna

@yuukiiwa
Copy link
Collaborator

Hi Anna,

  1. Because of io, --n_processes uses at most 2 cpus in xpore dataprep. Still, parallel processing should work well in xpore diffmod from my experience.
  2. Thanks for raising this! I will discuss with @ploy-np on this.

Thanks!

Best wishes,
Yuk Kei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants