Skip to content

Commit

Permalink
Improve Documentation (#24)
Browse files Browse the repository at this point in the history
* improve NTA documentation

* add NTA struct documentation

* add doc fixes and README improvements
  • Loading branch information
iblacksand authored Apr 25, 2024
1 parent 4b6c4fd commit 548482e
Show file tree
Hide file tree
Showing 7 changed files with 61 additions and 26 deletions.
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,26 @@

Rust implementation of [WebGestaltR](https://github.com/bzhanglab/webgestaltr).

## Notes

This CLI is focused purely on computation. **It does not provide GMT files or HTML reports**. The output of this tool is JSON files containing the results. For a more feature-complete tool, see the original [WebGestaltR](https://bzhanglab.github.io/WebGestaltR/) tool.

## Install

```shell
git clone https://github.com/bzhanglab/webgestalt_rust.git
cd webgestalt_rust
cargo build --release
cargo install webgestalt
```

## Run
## CLI

For help with CLI, run

```shell
cargo run --release -- example ora
webgestalt --help
```

Example of running over-representation analysis using `kegg.gmt`, with an interesting list at `int.txt` and a reference of `ref.txt`. Outputs JSON file at `output.json`

```shell
ora -g kegg.gmt -i int.txt -r ref.txt -o output.json
```
1 change: 1 addition & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#![doc = include_str!("../README.md")]
use clap::{Args, Parser};
use clap::{Subcommand, ValueEnum};
use owo_colors::{OwoColorize, Stream::Stdout, Style};
Expand Down
2 changes: 1 addition & 1 deletion webgestalt_lib/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#![doc = include_str!("../README.md")]
use std::{error::Error, fmt};

pub mod methods;
pub mod readers;
pub mod stat;
pub mod writers;

trait CustomError {
fn msg(&self) -> String;
}
Expand Down
3 changes: 1 addition & 2 deletions webgestalt_lib/src/methods/gsea.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ pub struct RankListItem {
}

struct PartialGSEAResult {
// TODO: Look at adding enrichment and normalized enrichment score
set: String,
p: f64,
es: f64,
Expand Down Expand Up @@ -296,7 +295,7 @@ fn enrichment_score(
)
}

/// Run GSEA and return a [`Vec<FullGSEAResult`] for all analayte sets.
/// Run GSEA and return a [`Vec<FullGSEAResult>`] for all analayte sets.
///
/// # Parameters
///
Expand Down
2 changes: 1 addition & 1 deletion webgestalt_lib/src/methods/multilist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ pub enum NormalizationMethod {
/// # Parameters
///
/// - `jobs` - A [`Vec<GSEAJob>`] containing all of the separates 'jobs' or analysis to combine
/// - `method` - A [`MultiOmicsMethod`] enum detailing the analysis method to combine the runs together (meta-analysis, mean median ration, or max median ratio).
/// - `method` - A [`MultiListMethod`] enum detailing the analysis method to combine the runs together (meta-analysis, mean median ration, or max median ratio).
/// - `fdr_method` - [`AdjustmentMethod`] of what FDR method to use to adjust p-values
///
/// # Returns
Expand Down
51 changes: 42 additions & 9 deletions webgestalt_lib/src/methods/nta.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,16 @@ pub struct NTAConfig {
pub reset_probability: f64,
/// A float representing the tolerance for probability calculation
pub tolerance: f64,
/// The [`NTAMethod`] to use for the analysis
pub method: Option<NTAMethod>,
}

/// Different methods for the NTA method that decides the important nodes to return
#[derive(Debug, Clone)]
pub enum NTAMethod {
/// Find the N most important seeds, where N is the provided [`usize`] value
Prioritize(usize),
/// Find the N most important non-seed nodes, where N is the provided [`usize`] value
Expand(usize),
}

Expand All @@ -34,19 +38,32 @@ impl Default for NTAConfig {
}
}

/// Struct representing the NTA results
#[derive(Debug, Serialize)]
pub struct NTAResult {
/// The nodes in the neighborhood. Will always include every seed
pub neighborhood: Vec<String>,
/// The random walk probabilities (score) for the nodes in the neighborhood
pub scores: Vec<f64>,
/// If using the Prioritize method, contains the top N seeds. For expand method, this Vec is empty.
pub candidates: Vec<String>,
}

/// Performs network topology-based analysis using random walk to identify important nodes in a network
///
/// ## Parameters
///
/// - `config`: A [`NTAConfig`] struct containing the parameters for the analysis.
///
/// ## Returns
///
/// Returns a [`NTAResult`] struct containing the results from the analysis. Is [serde](https://serde.rs/) compatible.
pub fn get_nta(config: NTAConfig) -> NTAResult {
let mut method = config.clone().method;
if method.is_none() {
method = Some(NTAMethod::Expand(10));
}
let mut nta_res = nta(config.clone());
let mut nta_res = process_nta(config.clone());
match method {
Some(NTAMethod::Prioritize(size)) => {
let only_seeds = nta_res
Expand Down Expand Up @@ -95,12 +112,16 @@ pub fn get_nta(config: NTAConfig) -> NTAResult {
}
}

/// Uses random walk to calculate the neighborhood of a set of nodes
/// Returns [`Vec<String>`]representing the nodes in the neighborhood
/// Uses random walk to calculate the probabilities of each node being walked through
/// Returns [`Vec<String>`] representing the nodes in the neighborhood
///
/// ## Parameters
/// - `config` - A [`NTAConfig`] struct containing the edge list, seeds, neighborhood size, reset probability, and tolerance
///
/// # Parameters
/// - `config` - A [`NTAOptions`] struct containing the edge list, seeds, neighborhood size, reset probability, and tolerance
pub fn nta(config: NTAConfig) -> Vec<(String, f64)> {
/// ## Returns
///
/// Returns a [`Vec<(String, f64)>`] where the [`String`] is the original node name, and the following value is the random walk probability (higher is typically better)
pub fn process_nta(config: NTAConfig) -> Vec<(String, f64)> {
println!("Building Graph");
let unique_nodes = ahash::AHashSet::from_iter(config.edge_list.iter().flatten().cloned());
let mut node_map: ahash::AHashMap<String, usize> = ahash::AHashMap::default();
Expand Down Expand Up @@ -135,20 +156,32 @@ pub fn nta(config: NTAConfig) -> Vec<(String, f64)> {
.collect()
}

/// calculates the probability each node will be walked when starting from the one of the seeds
///
/// ## Parameters
///
/// - `adj_matrix` - A 2d adjacency matrix, where 1 means the node at the row and column indices are connected
/// - `seed_indices` - a [`Vec<usize>`] of the indices of the seeds (starting points)
/// - `r` - a [`f64`] of the reset probability (default in WebGestaltR is 0.5)
/// - `tolerance` - the tolerance/threshold value in [`f64`] (WebGestaltR default is `1e-6`)
///
/// ## Output
///
/// Returns 1d array containing the probability for each node
fn random_walk_probability(
adj_matrix: &ndarray::Array2<f64>,
node_indices: &Vec<usize>,
seed_indices: &Vec<usize>,
r: f64,
tolerance: f64,
) -> ndarray::Array1<f64> {
let num_nodes = node_indices.len() as f64;
let num_nodes = seed_indices.len() as f64;
let de = adj_matrix.sum_axis(Axis(0));
// de to 2d array
let de = de.insert_axis(Axis(1));
let temp = adj_matrix.t().div(de);
let w = temp.t();
let mut p0 = ndarray::Array1::from_elem(w.shape()[0], 0.0);
for i in node_indices {
for i in seed_indices {
p0[*i] = 1.0 / num_nodes;
}
let mut pt = p0.clone();
Expand Down
8 changes: 0 additions & 8 deletions webgestalt_lib/src/methods/ora.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,6 @@ pub fn ora_p(m: i64, j: i64, n: i64, k: i64) -> f64 {
/// - `interest_list` - A [`AHashSet<String>`] of the interesting analytes
/// - `reference` - A [`AHashSet<String>`] of the reference list
/// - `gmt` - A [`Vec<Item>`] of the gmt file
///
/// # Panics
///
/// Panics if the [`Arc`] struggles to lock during parallelization.
///
/// # Errors
///
/// This function will return an error if .
pub fn get_ora(
interest_list: &AHashSet<String>,
reference: &AHashSet<String>,
Expand Down

0 comments on commit 548482e

Please sign in to comment.