- +

Usage

+

User can choose among 4 ways to simulate template reads. +- use a real count matrix +- estimated the parameter from a real count matrix to simulate synthetic count matrix +- specified by his/her own the input parameter +- a combination of the above options

+

We use SPARSIM tools to simulate count matrix. for more information a bout synthetic count matrix, please read SPARSIM documentaion.

+

EXAMPLES

+
Sample data
+

A demonstration dataset to initiate this workflow is accessible on zenodo DOI : 10.5281/zenodo.12731408. This dataset is a subsample from a Nanopore run of the 10X 5k human pbmcs.

+

The human GRCh38 reference transcriptome, gtf annotation and fasta referance genome can be downloaded from Ensembl.

+
BASIC WORKFLOW
+
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/genes.gtf
+
+
WITH PCR AMPLIFICTION
+
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/GRCh38-2020-A-genes.gtf \
+                      --pcr_cycles 2 \
+                      --pcr_dup_rate 0.7 \
+                      --pcr_error_rate 0.00003
+
+
WITH SIMULATED CELL TYPE COUNTS
+
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/GRCh38-2020-A-genes.gtf \
+                      --sim_celltypes true \
+                      --cell_types_annotation dataset/sub_pbmc_cell_type.csv
+
+
WITH PERSONALIZED ERROR MODEL
+
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                     --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                     --features gene_name \
+                     --gtf dataset/GRCh38-2020-A-genes.gtf \
+                     --build_model true \
+                     --fastq_model dataset/sub_pbmc_reads.fq \
+                     --ref_genome dataset/GRCh38-2020-A-genome.fa 
+
+
COMPLETE WORKFLOW
+
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/GRCh38-2020-A-genes.gtf \
+                      --sim_celltypes true \
+                      --cell_types_annotation dataset/sub_pbmc_cell_type.csv
+                      --build_model true \
+                      --fastq_model dataset/sub_pbmc_reads.fq \
+                      --ref_genome dataset/GRCh38-2020-A-genome.fa 
+                      --pcr_cycles 2 \
+                      --pcr_dup_rate 0.7 \
+                      --pcr_error_rate 0.00003
+
+

Results

+

After execution, results will be available in the specified --outdir. This includes simulated Nanopore reads .fastq, along with log files and QC report.

+

Cleaning Up

+

To clean up temporary files generated by Nextflow:

+
nextflow clean -f
+