Skip to content

Commit

Permalink
Merge branch 'gh-page' of https://github.com/alihamraoui/AsaruSim int…
Browse files Browse the repository at this point in the history
…o gh-page
  • Loading branch information
alihamraoui committed Jul 12, 2024
2 parents 9849b15 + 76c40db commit e3cc72f
Show file tree
Hide file tree
Showing 2 changed files with 67 additions and 190 deletions.
82 changes: 67 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Configuration for error model:
| `error_model` | Custom error model file (optional) | `null` |
| `qscore_model` | Custom Q-score model file (optional) | `null` |
| `build_model` | to build your own error/Qscor model | `false` |
| `model_fastq` | reference real read (.fastq) to train error model (optional) | `false` |
| `fastq_model` | reference real read (.fastq) to train error model (optional) | `false` |
| `ref_genome` | reference genome .fasta file (optional) | `false` |

### Additional Parameters
Expand All @@ -78,17 +78,7 @@ Configuration for running the workflow:
| `container` | Docker container for the workflow | `'hamraouii/wf-SLSim'` |
| `docker.runOptions` | Docker run options to use | `'-u $(id -u):$(id -g)'` |

## Execution

To run the workflow with basic example:

You can simulate a realistic disribution of per barcodes UMI counts by providing in addition to the filtered counts matrix a .csv file of BC counts. AsaruSim will add a random transcrips count to fit the real distribution.
```bash
nextflow run main.nf --matrix test_data/matrix.csv \
--bc_counts test_data/test_bc.csv \
--transcriptome test_data/transcriptome.fa \

```
## Usage
User can choose among 4 ways to simulate template reads.
- use a real count matrix
- estimated the parameter from a real count matrix to simulate synthetic count matrix
Expand All @@ -97,10 +87,72 @@ User can choose among 4 ways to simulate template reads.

We use SPARSIM tools to simulate count matrix. for more information a bout synthetic count matrix, please read [SPARSIM](https://gitlab.com/sysbiobig/sparsim/-/blob/master/vignettes/sparsim.Rmd?ref_type=heads#Sec_Input_parameter_estimated_from_data) documentaion.

#### use a real count matrix
### EXAMPLES
##### Sample data
A demonstration dataset to initiate this workflow is accessible on zenodo DOI : [10.5281/zenodo.12731408](https://zenodo.org/records/12731409). This dataset is a subsample from a Nanopore run of the [10X 5k human pbmcs](https://www.10xgenomics.com/datasets/5k-human-pbmcs-3-v3-1-chromium-controller-3-1-standard).

The human GRCh38 [reference transcriptome](https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/cdna/), [gtf annotation](https://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/) and [fasta referance genome](https://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/) can be downloaded from Ensembl.


##### BASIC WORKFLOW

```bash
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/genes.gtf
```

##### WITH PCR AMPLIFICTION

```bash
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--pcr_cycles 2 \
--pcr_dup_rate 0.7 \
--pcr_error_rate 0.00003
```

##### WITH SIMULATED CELL TYPE COUNTS

```bash
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--sim_celltypes true \
--cell_types_annotation dataset/sub_pbmc_cell_type.csv
```

##### WITH PERSONALIZED ERROR MODEL

```bash
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--build_model true \
--fastq_model dataset/sub_pbmc_reads.fq \
--ref_genome dataset/GRCh38-2020-A-genome.fa
```

##### COMPLETE WORKFLOW

```bash
nextflow run main.nf --matrix test_data/matrix.csv \
--transcriptome test_data/transcriptome.fa
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--sim_celltypes true \
--cell_types_annotation dataset/sub_pbmc_cell_type.csv
--build_model true \
--fastq_model dataset/sub_pbmc_reads.fq \
--ref_genome dataset/GRCh38-2020-A-genome.fa
--pcr_cycles 2 \
--pcr_dup_rate 0.7 \
--pcr_error_rate 0.00003
```

## Results
Expand Down
175 changes: 0 additions & 175 deletions index.html

This file was deleted.

0 comments on commit e3cc72f

Please sign in to comment.