Skip to content

joshuagryphon/plastid

Repository files navigation

Welcome to plastid!

For documentation, see our home page on ReadtheDocs.

To run the tests, download the test dataset and unpack it into plastid/test.

Introduction

plastid is a Python library for genomic analysis -- in particular, high-throughput sequencing data -- with an emphasis on simplicity for users. It was written by Joshua Dunn in Jonathan Weissman's lab at UCSF, initially for analysis of ribosome profiling and RNA-seq data. Versions of it have been used in several publications.

plastid intended audience includes computational and traditional biologists, software developers, and even those who are new to sequencing analysis. It is released under the BSD 3-Clause license.

This package provides:

  1. A set of scripts that implement common sequencing analyses
  2. A set of classes for exploratory data analysis. These provide simple and consistent interfaces for manipulating genomic features, read alignments, and quantitative data; and readily interface with existing scientific tools, like the SciPy stack.
  3. Script writing tools that make it easy to use the objects implemented in plastid.
  4. Extensive documentation, both in source code and at our home page on ReadtheDocs.

Installation

Bioconda

install with bioconda

Bioconda is a channel for the conda package manager with a focus on bioinformatics software. Once you have Bioconda installed, installing plastid is as easy as running:

$ conda create -n plastid plastid
$ source activate plastid

This will install all of the necessary dependencies for plastid in an isolated environment.

PyPI

plastid can be installed directly from PyPI

$ pip install numpy pysam cython $ pip install plastid

If you get any runtime warnings about numpy versions having changed, or about a missing module in Pysam, or about some object being the wrong size, try regenerating the included C source files from the original Cython code. To do this type:

$ pip install --upgrade  --install-option='--recythonize' plastid

Running the tests

  • NOTE: to run the entire test suite you'll first need to download our test dataset and unpack it into plastid/test/data.

We use nose as our test runner, and test under different versions of Python using tox. To completely control the environment (e.g. compilers et c), we recommend running the tests inside the Docker container, which contains large data files needed for the tests that aren't packaged with plastid by default:

# build & run the Docker image from within the project folder
$ docker build --pull -t plastid .
$ docker run -it --rm plastid

# inside the container, run the tests over all default configurations
root@plastid $ tox

Our tox config lets developers run subsets of tests rather than the full suite. All positional arguments are passed through to nosetests

# run all tests within the plastid.test.unit subpackage
root@plastid $ tox plastid.test.unit

# run tests in two files
root@plastid $ tox plastid.test.unit.genomics.readers.test_bed plastid.test.unit.util.io.test_binary

By default, tox recompiles all C extensions before running the tests. This can be slow. To avoid doing that, set the environment variable PLASTID_NOREBUILD to true:

# run unit tests without rebuilding the C extensions
root@plastid $ env PLASTID_NOREBUILD=true tox plastid.test.unit

Finally, if you only want to test in some, not all environments, you can do so with typical tox syntax:

# list available test environments
root@plastid $ tox -l
py36-pinned
py36-latest
py39-latest

# run only in 2 selected environments
root@plastid $ tox -e py36-pinned,py39-latest plastid.test.unit

Links & help