Merge branch 'gh-page' of https://github.com/alihamraoui/AsaruSim int…

…o gh-page
GenomiqueENS · Jul 12, 2024 · e3cc72f · e3cc72f
2 parents 9849b15 + 76c40db
commit e3cc72f
Show file tree

Hide file tree

Showing 2 changed files with 67 additions and 190 deletions.
diff --git a/README.md b/README.md
@@ -57,7 +57,7 @@ Configuration for error model:
 | `error_model`      | Custom error model file (optional)                            | `null`                                        |
 | `qscore_model`     | Custom Q-score model file (optional)                          | `null`                                        |
 | `build_model`      | to build your own error/Qscor model                           | `false`                                       |
-| `model_fastq`      | reference real read (.fastq) to train error model   (optional)      | `false`                                       |
+| `fastq_model`      | reference real read (.fastq) to train error model   (optional)      | `false`                                       |
 | `ref_genome`       | reference genome .fasta file (optional)                       | `false`                                       |
 
 ### Additional Parameters
@@ -78,17 +78,7 @@ Configuration for running the workflow:
 | `container`       | Docker container for the workflow  | `'hamraouii/wf-SLSim'`    |
 | `docker.runOptions` | Docker run options to use       | `'-u $(id -u):$(id -g)'`  |
 
-## Execution
-
-To run the workflow with basic example:
-
-You can simulate a realistic disribution of per barcodes UMI counts by providing in addition to the filtered counts matrix a .csv file of BC counts.  AsaruSim will add a random transcrips count to fit the real distribution.
-```bash
-nextflow run main.nf --matrix test_data/matrix.csv \
-                     --bc_counts test_data/test_bc.csv \
-                     --transcriptome test_data/transcriptome.fa \
-
-```
+## Usage
 User can choose among 4 ways to simulate template reads.
 - use a real count matrix
 - estimated the parameter from a real count matrix to simulate synthetic count matrix 
@@ -97,10 +87,72 @@ User can choose among 4 ways to simulate template reads.
 
 We use SPARSIM tools to simulate count matrix. for more information a bout synthetic count matrix, please read [SPARSIM](https://gitlab.com/sysbiobig/sparsim/-/blob/master/vignettes/sparsim.Rmd?ref_type=heads#Sec_Input_parameter_estimated_from_data) documentaion.
 
-#### use a real count matrix
+### EXAMPLES 
+##### Sample data
+A demonstration dataset to initiate this workflow is accessible on zenodo DOI : [10.5281/zenodo.12731408](https://zenodo.org/records/12731409). This dataset is a subsample from a Nanopore run of the [10X 5k human pbmcs](https://www.10xgenomics.com/datasets/5k-human-pbmcs-3-v3-1-chromium-controller-3-1-standard).
+
+The human GRCh38 [reference transcriptome](https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/cdna/), [gtf annotation](https://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/) and [fasta referance genome](https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/) can be downloaded from Ensembl.
+
+
+##### BASIC WORKFLOW
+
+```bash
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/genes.gtf
+```
+
+##### WITH PCR AMPLIFICTION
+
+```bash
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/GRCh38-2020-A-genes.gtf \
+                      --pcr_cycles 2 \
+                      --pcr_dup_rate 0.7 \
+                      --pcr_error_rate 0.00003
+```
+
+##### WITH SIMULATED CELL TYPE COUNTS
+
+```bash
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/GRCh38-2020-A-genes.gtf \
+                      --sim_celltypes true \
+                      --cell_types_annotation dataset/sub_pbmc_cell_type.csv
+```
+
+##### WITH PERSONALIZED ERROR MODEL
+
+```bash
+nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                     --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                     --features gene_name \
+                     --gtf dataset/GRCh38-2020-A-genes.gtf \
+                     --build_model true \
+                     --fastq_model dataset/sub_pbmc_reads.fq \
+                     --ref_genome dataset/GRCh38-2020-A-genome.fa 
+```
+
+##### COMPLETE WORKFLOW
+
 ```bash
-nextflow run main.nf --matrix test_data/matrix.csv \
-                     --transcriptome test_data/transcriptome.fa
+ nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
+                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
+                      --features gene_name \
+                      --gtf dataset/GRCh38-2020-A-genes.gtf \
+                      --sim_celltypes true \
+                      --cell_types_annotation dataset/sub_pbmc_cell_type.csv
+                      --build_model true \
+                      --fastq_model dataset/sub_pbmc_reads.fq \
+                      --ref_genome dataset/GRCh38-2020-A-genome.fa 
+                      --pcr_cycles 2 \
+                      --pcr_dup_rate 0.7 \
+                      --pcr_error_rate 0.00003
 ```
 
 ## Results

diff --git a/index.html b/index.html