Usage
+User can choose among 4 ways to simulate template reads. +- use a real count matrix +- estimated the parameter from a real count matrix to simulate synthetic count matrix +- specified by his/her own the input parameter +- a combination of the above options
+We use SPARSIM tools to simulate count matrix. for more information a bout synthetic count matrix, please read SPARSIM documentaion.
+EXAMPLES
+Sample data
+A demonstration dataset to initiate this workflow is accessible on zenodo DOI : 10.5281/zenodo.12731408. This dataset is a subsample from a Nanopore run of the 10X 5k human pbmcs.
+The human GRCh38 reference transcriptome, gtf annotation and fasta referance genome can be downloaded from Ensembl.
+BASIC WORKFLOW
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+ --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+ --features gene_name \
+ --gtf dataset/genes.gtf
+
+WITH PCR AMPLIFICTION
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+ --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+ --features gene_name \
+ --gtf dataset/GRCh38-2020-A-genes.gtf \
+ --pcr_cycles 2 \
+ --pcr_dup_rate 0.7 \
+ --pcr_error_rate 0.00003
+
+WITH SIMULATED CELL TYPE COUNTS
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+ --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+ --features gene_name \
+ --gtf dataset/GRCh38-2020-A-genes.gtf \
+ --sim_celltypes true \
+ --cell_types_annotation dataset/sub_pbmc_cell_type.csv
+
+WITH PERSONALIZED ERROR MODEL
+nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+ --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+ --features gene_name \
+ --gtf dataset/GRCh38-2020-A-genes.gtf \
+ --build_model true \
+ --fastq_model dataset/sub_pbmc_reads.fq \
+ --ref_genome dataset/GRCh38-2020-A-genome.fa
+
+COMPLETE WORKFLOW
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+ --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+ --features gene_name \
+ --gtf dataset/GRCh38-2020-A-genes.gtf \
+ --sim_celltypes true \
+ --cell_types_annotation dataset/sub_pbmc_cell_type.csv
+ --build_model true \
+ --fastq_model dataset/sub_pbmc_reads.fq \
+ --ref_genome dataset/GRCh38-2020-A-genome.fa
+ --pcr_cycles 2 \
+ --pcr_dup_rate 0.7 \
+ --pcr_error_rate 0.00003
+
+Results
+After execution, results will be available in the specified --outdir
. This includes simulated Nanopore reads .fastq
, along with log files and QC report.
Cleaning Up
+To clean up temporary files generated by Nextflow:
+nextflow clean -f
+