Skip to content

Commit

Permalink
updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
KevinMenden committed Mar 25, 2021
1 parent c984d91 commit 238c2dc
Show file tree
Hide file tree
Showing 5 changed files with 23 additions and 7 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
* Improved logging and using rich progress bar for training
* Gene subsetting is now done only when merging datasets, which will allow to generate different combinations
of simulated datasets
* Added `scaden merge` command which allows merging of previously created datasets

### Version 1.0.2

Expand Down
15 changes: 9 additions & 6 deletions docs/blog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,13 @@
Apart from the changelog, this is a more informal section where I will inform about new features
that have been (or will be) implemented in Scaden.

# Scaden v1.1.0 - Performance Improvements (21.03.2021)
# Scaden v1.1.0 - Performance Improvements and `scaden merge` tool (21.03.2021)

Scaden v1.1.0 brings significantly improved memory consumption for the data simulation step, which was a frequently asked for feature.
Now, instead of using about 4 GB of memory to simulate a small dataset, Scaden only uses 1 GB. Memory usage does not increase
with the number of datasets anymore. This will allow to create datasets from large collections of scRNA-seq datasets without
needing excessive memory. Furthermore, Scaden now stores the simulated data in `.h5ad` format with the full list of genes.
This way you can simulate from a scRNA-seq dataset once and combine it with other datasets in the future. To help with this,
I've added the `scaden merge` command, which takes a list of datasets or a directory with `.h5ad` datasets and creates
a new training dataset from it.

Scaden v1.1.0 brings significantly improved memory consumption for the data simulation step, which was a asked for
quite frequently. Now, instead of using about 4 GB of memory to simulate a small dataset, Scaden only uses 1 GB. This will
allow to create datasets from large collections of scRNA-seq datasets without needing excessive memory. Furthermore,
Scaden now stores the simulated data in `.h5ad` format with the full list of genes. This way you can simulate from a
scRNA-seq dataset once and combine it with other datasets in the future.
2 changes: 2 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
* Improved logging and using rich progress bar for training
* Gene subsetting is now done only when merging datasets, which will allow to generate different combinations
of simulated datasets
* Added `scaden merge` command which allows merging of previously created datasets


### Version 1.0.2

Expand Down
4 changes: 4 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,7 @@ at the [DZNE Tübingen](https://www.dzne.de/en/about-us/sites/tuebingen/) and th

A paper describing Scaden has been published in Science Advances:
[Deep-learning based cell composition analysis from tissue expression profiles](https://advances.sciencemag.org/content/6/30/eaba2619)

For information about how to install Scaden, go to the [Installation](installation.md) section. Look in the [Usage](usage.md)
section for general help with Scaden usage. In the [Datasets](datasets.md) section you'll find a list of prepared training datasets.
You can also have a look in the [Blog](blog.md) section, where I summarize new features that are added to Scaden.
8 changes: 7 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,13 @@ An example for a pattern would be `*_counts.txt`. This pattern would find the fo

Make sure to include an `*` in your pattern!

This command will create the artificial samples in the current working directory. You can also specificy an output directory using the `--out` parameter. Scaden will also directly create a .h5ad file in this directory, which is the file you will need for training. By default, this file will be called `data.h5ad`, however you can change the prefix using the `--prefix` flag.
This command will create the artificial samples in the current working directory. You can also specificy an output directory using the `--out` parameter.
Scaden will also directly create a .h5ad file in this directory, which is the file you will need for training.
By default, this file will be called `data.h5ad`, however you can change the prefix using the `--prefix` flag.

Alternatively, you can manually merge `.h5ad` files that have been created with `scaden simulate` from v1.1.0 on using
the `scaden merge` command. Either point it to a directory of `.h5ad` files, or give it a comma-separated list of files
to merge. Type `scaden merge --help` for details.

## File Formats
For Scaden to work properly, your input files have to be correctly formatted. As long as you use Scadens inbuilt functionality to generate the training data, you should have no problem
Expand Down

0 comments on commit 238c2dc

Please sign in to comment.