Skip to content

Commit

Permalink
merge
Browse files Browse the repository at this point in the history
  • Loading branch information
rob-p committed Aug 26, 2024
2 parents a820066 + 407938c commit d0c9883
Show file tree
Hide file tree
Showing 6 changed files with 55 additions and 19 deletions.
4 changes: 3 additions & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# This file was autogenerated by cargo-dist: https://opensource.axo.dev/cargo-dist/
#
# Copyright 2022-2024, axodotdev
# SPDX-License-Identifier: MIT or Apache-2.0
#
Expand Down Expand Up @@ -61,7 +63,7 @@ jobs:
# we specify bash to get pipefail; it guards against the `curl` command
# failing. otherwise `sh` won't catch that `curl` returned non-0
shell: bash
run: "curl --proto '=https' --tlsv1.2 -LsSf https://github.com/axodotdev/cargo-dist/releases/download/v0.19.1/cargo-dist-installer.sh | sh"
run: "curl --proto '=https' --tlsv1.2 -LsSf https://github.com/axodotdev/cargo-dist/releases/download/v0.21.1/cargo-dist-installer.sh | sh"
- name: Cache cargo-dist
uses: actions/upload-artifact@v4
with:
Expand Down
11 changes: 7 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "oarfish"
version = "0.6.0"
version = "0.6.1"
edition = "2021"
authors = [
"Zahra Zare Jousheghani <zzare@umd.edu>",
Expand Down Expand Up @@ -60,7 +60,8 @@ crossbeam = { version = "0.8.4", features = [
sprs = "0.11.1"
minimap2-sys = { version = "0.1.19" }
# rely on minimap2-temp until upstream version is pushed
minimap2-temp = { version = "0.1.30" }
# make sure relevant changes are in upstream PR
minimap2-temp = { version = "0.1.31" }
# alternative sources for dev
#git = "https://github.com/rob-p/minimap2-rs.git", branch = "alignment-score" }
#git = "https://github.com/jguhlin/minimap2-rs.git", branch = "alignment-score" }
Expand All @@ -84,7 +85,7 @@ lto = "thin"
# Config for 'cargo dist'
[workspace.metadata.dist]
# The preferred cargo-dist version to use in CI (Cargo.toml SemVer syntax)
cargo-dist-version = "0.19.1"
cargo-dist-version = "0.21.1"
# CI backends to support
ci = "github"
# The installers to generate for each app
Expand All @@ -95,7 +96,7 @@ targets = [
"x86_64-apple-darwin",
"x86_64-unknown-linux-gnu",
]
# Publish jobs to run in CI
# Which actions to run on pull requests
pr-run-mode = "plan"
# Whether to install an updater program
install-updater = false
Expand All @@ -104,3 +105,5 @@ install-path = "CARGO_HOME"

[workspace.metadata.dist.github-custom-runners]
aarch64-apple-darwin = "macos-14"
# don't have linux arm builders on GitHub yet
# aarch64-unknown-linux-gnu = "buildjet-8vcpu-ubuntu-2204-arm"
4 changes: 0 additions & 4 deletions README.md

This file was deleted.

1 change: 1 addition & 0 deletions README.md
50 changes: 40 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,46 @@
# oarfish: transcript quantification from long-read RNA-seq data

### Basic usage
`oarfish` is a program, written in Rust (https://www.rust-lang.org/), for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. `oarfish` requires a sample of sequencing reads aligned to the _transcriptome_ (currntly not to the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm.

`oarfish` is a program, written in [`rust`](https://www.rust-lang.org/), for quantifying transcript-level expression from long-read (i.e. Oxford nanopore cDNA and direct RNA and PacBio) sequencing technologies. `oarfish` requires a sample of sequencing reads aligned to the *transcriptome* (currntly not to the genome). It handles multi-mapping reads through the use of probabilistic allocation via an expectation-maximization (EM) algorithm.
It optionally employs many filters to help discard alignments that may reduce quantification accuracy. Currently, the set of filters applied in `oarfish` are directly derived from the [`NanoCount`](https://github.com/a-slide/NanoCount)[^Gleeson] tool; both the filters that exist, and the way their values are set (with the exception of the `--three-prime-clip` filter, which is not set by default in `oarfish` but is in `NanoCount`).

It optionally employs many filters to help discard alignments that may reduce quantification accuracy. Currently, the set of filters applied in `oarfish` are directly derived from the [`NanoCount`](https://github.com/a-slide/NanoCount)[^Gleeson] tool; both the filters that exist, and the way their values are set (with the exception of the `--three-prime-clip` filter, which is not set by default in `oarfish` but is in `NanoCount`).
Additionally, `oarfish` provides options to make use of coverage profiles derived from the aligned reads to improve quantification accuracy. The use of this coverage model is enabled with the `--model-coverage` flag. You can read more about `oarfish`[^preprint] in the [preprint](https://www.biorxiv.org/content/10.1101/2024.02.28.582591v1). Please cite the preprint if you use `oarfish` in your work or analysis.

Additionally, `oarfish` provides options to make use of coverage profiles derived from the aligned reads to improve quantification accuracy. The use of this coverage model is enabled with the `--model-coverage` flag. You can read more about `oarfish`[^preprint] in the [preprint](https://www.biorxiv.org/content/10.1101/2024.02.28.582591v1). Please cite the preprint if you use `oarfish` in your work or analysis.
Also, please note that `oarfish` is scientific software in active development. Therefore, please check the [GitHub Release](https://github.com/COMBINE-lab/oarfish/releases) page to make sure that you are using the latest version

Also, please note that `oarfish` is scientific software in active development. Therefore, please check the [GitHub Release](https://github.com/COMBINE-lab/oarfish/releases) page to make sure that you are using the latest version
(also, the `dev` branch should compile from source at all times so feel free to use it, but let us know if you run into any issues).
## Installation

`oarfish` can be installed in a variety of ways.

### Precompiled binaries

Binaries are available via [GitHub Releases](https://github.com/COMBINE-lab/oarfish/releases).

You can quickly install the latest release using the following helper script:

```sh
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/COMBINE-lab/oarfish/releases/latest/download/oarfish-installer.sh | sh
```

### Using `cargo`

If you have `cargo` installed, you can install `oarfish` directly from the source code:

```sh
cargo install oarfish
```

You can find the crate on [crates.io](https://crates.io/crates/oarfish).

### Bioconda

`oarfish` is available via [Bioconda](https://anaconda.org/bioconda/oarfish):

```sh
conda install -c bioconda oarfish
```

## Basic usage

The usage can be provided by passing `-h` at the command line.

Expand Down Expand Up @@ -75,7 +106,6 @@ EM:
location of short read quantification (if provided)
```


## Input to `oarfish`

`Oarfish` can accept as input either a `bam` file containing reads aligned to the transcriptome as specified [below](index.md#alignment-based-input), or
Expand Down Expand Up @@ -113,19 +143,19 @@ The parameters above should be explained by their relevant help option, but the

**In general**, if you apply a `filter-group`, the group options will be applied first and then any explicitly provided options given will override the corresponding option in the `filter-group`.

### Inferential Replicates
## Inferential Replicates

`oarfish` has the ability to compute [_inferential replicates_](https://academic.oup.com/nar/article/47/18/e105/5542870) of its quantification estimates. This is performed by bootstrap sampling of the original read mappings, and subsequently performing inference under each resampling. These inferential replicates allow assessing the variance of the point estimate of transcript abundance, and can lead to improved differential analysis at the transcript level, if using a differential testing tool that takes advantage of this information. The generation of inferential replicates is controlled by the `--num-bootstraps` argument to `oarfish`. The default value is `0`, meaning that no inferential replicates are generated. If you set this to some value greater than `0`, the the requested number of inferential replicates will be generated. It is recommended, if generating inferential replicates, to run `oarfish` with multiple threads, since replicate generation is highly-parallelized. Finally, if replicates are generated, they are written to a [`Parquet`](https://parquet.apache.org/), starting with the specified output stem and ending with `infreps.pq`.

### Output
## Output

The `--output` option passed to `oarfish` corresponds to a path prefix (this prefix can contain the path separator character and if it refers to a directory that does not yeat exist, that directory will be created). Based on this path prefix, say `P`, `oarfish` will create 2 files:

* `P.meta_info.json` - a JSON format file containing information about relevant parameters with which `oarfish` was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.
* `P.quant` - a tab separated file listing the quantified targets, as well as information about their length and other metadata. The `num_reads` column provides the estimate of the number of reads originating from each target.
* `P.infreps.pq` - a [`Parquet`](https://parquet.apache.org/) table where each row is a transcript and each column is an inferential replicate, containing the estimated counts for each transcript under each computed inferential replicate.

### References
## References

[^Gleeson]: Josie Gleeson, Adrien Leger, Yair D J Prawer, Tracy A Lane, Paul J Harrison, Wilfried Haerty, Michael B Clark, Accurate expression quantification from nanopore direct RNA sequencing with NanoCount, Nucleic Acids Research, Volume 50, Issue 4, 28 February 2022, Page e19, [https://doi.org/10.1093/nar/gkab1129](https://doi.org/10.1093/nar/gkab1129)

Expand Down
1 change: 1 addition & 0 deletions src/util/aux_counts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use itertools::izip;
pub struct CountInfo {
pub unique_count: u32,
pub total_count: u32,
#[allow(dead_code)]
pub expected_count: f64,
}

Expand Down
3 changes: 3 additions & 0 deletions src/util/oarfish_types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -262,8 +262,11 @@ impl<T: sam::alignment::record::Record> From<&T> for AlnInfo {
#[serde(rename_all = "PascalCase")]
pub struct ShortReadRecord {
pub name: String,
#[allow(dead_code)]
pub length: i32,
#[allow(dead_code)]
pub effective_length: f64,
#[allow(dead_code)]
#[serde(rename = "TPM")]
pub tpm: f64,
pub num_reads: f64,
Expand Down

0 comments on commit d0c9883

Please sign in to comment.