diff --git a/CHANGELOG.md b/CHANGELOG.md index e1bb857..560b1eb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,7 @@ * Improved logging and using rich progress bar for training * Gene subsetting is now done only when merging datasets, which will allow to generate different combinations of simulated datasets +* Added `scaden merge` command which allows merging of previously created datasets ### Version 1.0.2 diff --git a/docs/blog.md b/docs/blog.md index 93cc34c..8bf375f 100644 --- a/docs/blog.md +++ b/docs/blog.md @@ -2,10 +2,13 @@ Apart from the changelog, this is a more informal section where I will inform about new features that have been (or will be) implemented in Scaden. -# Scaden v1.1.0 - Performance Improvements (21.03.2021) +# Scaden v1.1.0 - Performance Improvements and `scaden merge` tool (21.03.2021) + +Scaden v1.1.0 brings significantly improved memory consumption for the data simulation step, which was a frequently asked for feature. +Now, instead of using about 4 GB of memory to simulate a small dataset, Scaden only uses 1 GB. Memory usage does not increase +with the number of datasets anymore. This will allow to create datasets from large collections of scRNA-seq datasets without +needing excessive memory. Furthermore, Scaden now stores the simulated data in `.h5ad` format with the full list of genes. +This way you can simulate from a scRNA-seq dataset once and combine it with other datasets in the future. To help with this, +I've added the `scaden merge` command, which takes a list of datasets or a directory with `.h5ad` datasets and creates +a new training dataset from it. -Scaden v1.1.0 brings significantly improved memory consumption for the data simulation step, which was a asked for -quite frequently. Now, instead of using about 4 GB of memory to simulate a small dataset, Scaden only uses 1 GB. This will -allow to create datasets from large collections of scRNA-seq datasets without needing excessive memory. Furthermore, -Scaden now stores the simulated data in `.h5ad` format with the full list of genes. This way you can simulate from a -scRNA-seq dataset once and combine it with other datasets in the future. diff --git a/docs/changelog.md b/docs/changelog.md index e1bb857..e5fa106 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -8,6 +8,8 @@ * Improved logging and using rich progress bar for training * Gene subsetting is now done only when merging datasets, which will allow to generate different combinations of simulated datasets +* Added `scaden merge` command which allows merging of previously created datasets + ### Version 1.0.2 diff --git a/docs/index.md b/docs/index.md index f1d7083..00661f1 100644 --- a/docs/index.md +++ b/docs/index.md @@ -8,3 +8,7 @@ at the [DZNE Tübingen](https://www.dzne.de/en/about-us/sites/tuebingen/) and th A paper describing Scaden has been published in Science Advances: [Deep-learning based cell composition analysis from tissue expression profiles](https://advances.sciencemag.org/content/6/30/eaba2619) + +For information about how to install Scaden, go to the [Installation](installation.md) section. Look in the [Usage](usage.md) +section for general help with Scaden usage. In the [Datasets](datasets.md) section you'll find a list of prepared training datasets. +You can also have a look in the [Blog](blog.md) section, where I summarize new features that are added to Scaden. \ No newline at end of file diff --git a/docs/usage.md b/docs/usage.md index f890b1e..50d5032 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -120,7 +120,13 @@ An example for a pattern would be `*_counts.txt`. This pattern would find the fo Make sure to include an `*` in your pattern! -This command will create the artificial samples in the current working directory. You can also specificy an output directory using the `--out` parameter. Scaden will also directly create a .h5ad file in this directory, which is the file you will need for training. By default, this file will be called `data.h5ad`, however you can change the prefix using the `--prefix` flag. +This command will create the artificial samples in the current working directory. You can also specificy an output directory using the `--out` parameter. +Scaden will also directly create a .h5ad file in this directory, which is the file you will need for training. +By default, this file will be called `data.h5ad`, however you can change the prefix using the `--prefix` flag. + +Alternatively, you can manually merge `.h5ad` files that have been created with `scaden simulate` from v1.1.0 on using +the `scaden merge` command. Either point it to a directory of `.h5ad` files, or give it a comma-separated list of files +to merge. Type `scaden merge --help` for details. ## File Formats For Scaden to work properly, your input files have to be correctly formatted. As long as you use Scadens inbuilt functionality to generate the training data, you should have no problem