Skip to content

Latest commit

 

History

History
179 lines (124 loc) · 8.5 KB

README.md

File metadata and controls

179 lines (124 loc) · 8.5 KB

CyTOF Dimension Reduction Framework

A framework of dimension reduction and its evaluation for both CyTOF and general-purpose usages.

Logo

Branch Release CI/CD Documentation Code Coverage
dev Badge1 Tests Documentation Status codecov

About

CytofDR is a framework of dimension reduction (DR) and its evaluation for both Cytometry by Time-of-Flight (CyTOF) and general-purpose usages. It allows you to conveniently run many different DRs at one place and then evaluate them to pick your embedding using our extensive evaluation framework! We aim to provide you with a reliable, extensible, and convenient interface for all your DR needs for both data analyses and future research!

Key Resources

  • For detailed benchmarks and methodology explanations, please check out our paper in Nature Communications!
  • For an online version of interactive results, please checkout CytofDR Playground.
  • For documentation, please visit our free and detailed documentation page.

Installation

You can install our CytofDR package, which is currentl on PyPI:

pip install CytofDR

Python (>=3.7) is required. This pacackage is architecture agnostic: it should run where PyPI or conda is available. All dependencies should be automatically installed. For a list of optional dependencies, please visit our documentation page's detailed Installation Guide.

Intallation should take less than a few minutes for most computers with reasonable network connections.

Conda Installation

I personally recommend using conda to install everything since it's so easy to work with virtual environments. If you need help on how to get conda installed in the first place, take a look here.

To install the package with conda:

conda install -c kevin931 cytofdr -c conda-forge -c bioconda

The core dependencies should automatically install!

Dependencies

Our dependencies are broken down core dependencies and optional dependencies. Below is a list of core dependencies:

  • scikit-learn
  • numpy
  • scipy
  • umap-learn
  • openTSNE
  • phate
  • annoy
  • matplotlib
  • seaborn

The most current compatible versions will work with CytofDR, except for numpy. New versions of numpy can cause issues with conda. If you wish to use PyCytoData, you need to install numpy version 1.20 or 1.21.

We also have some optional dependencies which are much trickier to install and manage. Refer to our Installation Guide for more details.

PyCytoData Integration

CytofDR is a member of the PyCytoData Alliance Plus, meaning that we're compatible with the PyCytoData package. The PyCytoData package is used mainly for loading datasets and managing every step of the CyTOF workflow. By creating and maintaining this ecosystem, we hope to create a robust workflow as a one-stop solution for CyTOF practioners using Python. To install PyCytoData, you can simply use the following command:

pip install PyCytoData

To view how you can perform DR using PyCYtoData, this tutorial walks through every step.

Quick Tutorial

CytofDR makes it easy to run many DR methods while also evaluating them for your CyTOF samples. We have a greatly simplified pipeline for your needs. To get started, follow this example:

>>> import numpy as np
>>> from CytofDR import dr
# Load Dataset
>>> expression = np.loadtxt(fname="PATH_To_file", dtype=float, skiprows=1, delimiter=",")
# Run DR and evaluate
>>> results = dr.run_dr_methods(expression, methods=["umap", "pca"])
Running PCA
Running UMAP
>>> results.evaluate(category = ["global", "local", "downstream"])
Evaluating global...
Evaluating local...
Evaluating downstream...
>>> results.rank_dr_methods()
{'PCA': 1.0, 'UMAP': 2.0}
# Save Results
>>> results.save_all_reductions(save_dir="PATH_to_DIR", delimiter=",")
>>> results.save_evaluations(path="PATH_to_FILE")

We strive to make our pipeline as simple as possible with natural langauge-like method names. Depending on your dataset size, the above example's runtime may vary. PCA is extremely fast, whereas can take upwards of 10 minutes if the dataset is much larger than 100,000 cells. For the evaluate command, the downstream command's silhouette score and clustering step can take some time, but for a small dataset, it can accomplish evaluation within a few minutes.

For large dataset, we recommend using efficient DR methods and providing your own clustering algorithm if possible.

Example Dataset

We have included an example dataset generated by cytomulate in the /example folder. The data is an artificial data with 1000 cells to mimic real CyTOF data. To use the dataset, you can subsitute PATH_to_file with the path to the example dataset exprs.txt, which is in the expression matrix format.

Examples using PyCytoData

You can use PyCytoData to load your dataset:

>>> from CytofDR import dr
>>> from PyCytoData import FileIO
# Load Dataset
>>> dataset = FileIO.load_expression("PATH_To_file", col_names = True)
# Run DR and evaluate
>>> results = dr.run_dr_methods(dataset.expression_matrix, methods=["umap", "pca"])
Running PCA
Running UMAP

Or with a benchmark dataset:

>>> from CytofDR import dr
>>> from PyCytoData import DataLoader
# Load Dataset
>>> dataset = DataLoader.load_dataset(dataset = "levine13")
# Run DR and evaluate
>>> results = dr.run_dr_methods(dataset.expression_matrix, methods=["umap", "pca"])
Running PCA
Running UMAP

All subsequent workflows remain the same.

Documentation

Of course, there are many more customizations and ways you can use CytofDR. So, for detailed tutorials and other guides, we suggest that you vists our Official Documentation.

There you will find ways to install our package and get started! Also, we offer tutorials on customizations, working with DR methods, and finally our detailed evaluation framework. We hope that you can find what you need over there!

Latest Release: v0.3.1

This is a minor maintenance update of v0.3.x with updated references and documentation.

Changes and New Features

  • Updated referneces and citation information in all relavent documentaion pages
  • Removed a warning on SAUCIE's installation documentation

Improvements

  • Update-to-date documentation and references

Deprecations

  • (Since v0.2.0) The comparison_classes parameter of the EvaluationMetrics.embedding_concordance method will no longer accept str input.

Issues and Contributions

If you run into issues or have questions, feel free to open an issue here. I'd love to help you out! We also welcome any contributions, but you may want to also look our contribution guide. Even if you just have an idea, that'll be great!

References

Our preprint "Comparative Analysis of Dimension Reductions Methods for Cytometry by Time-of-Flight Data" is on bioRxiv and can be accessed right here. If you use our package in your research or deployment, a citation of our paper is highly appreciated:

@article{wang2023comparative,
  title={Comparative analysis of dimension reduction methods for cytometry by time-of-flight data},
  author={Wang, Kaiwen and Yang, Yuqiu and Wu, Fangjiang and Song, Bing and Wang, Xinlei and Wang, Tao},
  journal={Nature Communications},
  volume={14},
  number={1},
  pages={1--18},
  year={2023},
  publisher={Nature Publishing Group}
}

For a list of references of the methods, metrics, etc. used in the package, please visit our References and bibliography of our paper.