Skip to content

Commit

Permalink
refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
dagou committed Sep 13, 2024
1 parent 0c0681c commit e5bd212
Show file tree
Hide file tree
Showing 43 changed files with 740 additions and 2,353 deletions.
139 changes: 0 additions & 139 deletions .github/workflows/rust.yml

This file was deleted.

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ out_dir/
slurm.sh
downloads/
test_database/
chunk/
3 changes: 1 addition & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
{
"rust-analyzer.linkedProjects": [
"./kr2r/Cargo.toml",
"./kr2r/Cargo.toml"
"./Cargo.toml",
]
}
43 changes: 40 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,43 @@
[workspace]
members = ["kr2r", "seqkmer"]
resolver = "2"
[package]
name = "kun_peng"
version = "0.7.1"
edition = "2021"
authors = ["eric9n@gmail.com"]
description = "Kun-peng: an ultra-fast, low-memory footprint and accurate taxonomy classifier for all"
license = "MIT"
repository = "https://github.com/eric9n/Kun-peng"
keywords = ["bioinformatics", "metagenomics", "microbiome", "exposome"]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[[bin]]
name = "kun_peng"
path = "src/bin/kun.rs"

[features]
double_hashing = []
exact_counting = []

[dependencies]
seqkmer = "0.1.1"
clap = { version = "4.4.10", features = ["derive"] }
hyperloglogplus = { version = "0.4.1", features = ["const-loop"] }
seahash = "4.1.0"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
byteorder = "1.4"
walkdir = "2"
rayon = "1.8"
libc = "0.2"
regex = "1.5.4"
flate2 = "1.0"
dashmap = { version = "6.0.1", features = ["rayon"] }
num_cpus = "1.13.1"

[dev-dependencies]
criterion = "0.5.1"
twox-hash = "1.6.3"
farmhash = { version = "1.1.5" }

[profile.release]
lto = true
Expand Down
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Kun-peng <img src="./kr2r/docs/KunPeng.png" alt="Kun-peng Logo" align="right" width="50"/>
# Kun-peng <img src="./docs/KunPeng.png" alt="Kun-peng Logo" align="right" width="50"/>

[![](https://img.shields.io/badge/doi-waiting-yellow.svg)]() [![](https://img.shields.io/badge/release%20version-0.7.0-green.svg)](https://github.com/eric9n/Kun-peng/releases)
[![](https://img.shields.io/badge/doi-waiting-yellow.svg)]() [![](https://img.shields.io/badge/release%20version-0.7.1-green.svg)](https://github.com/eric9n/Kun-peng/releases)

Here, we introduce Kun-peng, an ultra-memory-efficient metagenomic classification tool (Fig. 1). Inspired by Kraken2's k-mer-based approach, Kun-peng employs algorithms for minimizer generation, hash table querying, and classification. The cornerstone of Kun-peng's memory efficiency lies in its unique ordered block design for reference database. This strategy dramatically reduces memory usage without compromising speed, enabling Kun-peng to be executed on both personal computers and HPCP for most databases. Moreover, Kun-peng incorporates an advanced sliding window algorithm for sequence classifications to reduce the false-positive rates. Finally, Kun-peng supports parallel processing algorithms to further bolster its speed. Kun-peng offers two classification modes: Memory-Efficient Mode (Kun-peng-M) and Full-Speed Mode (Kun-peng-F). Remarkably, Kun-peng-M achieves a comparable processing time to Kraken2 while using less than 10% of its memory. Kun-peng-F loads all the database blocks simultaneously, matching Kraken2's memory usage while surpassing its speed. Notably, Kun-peng is compatible with the reference database built by Kraken2 and the associated abundance estimate tool Bracken<sub>1</sub>, making the transition from Kraken2 effortless. The name "Kun-peng" was derived from Chinese mythology and refers to a creature transforming between a giant fish (Kun) and a giant bird (Peng), reflecting the software's flexibility in navigating complex metagenomic data landscapes.


<div style="text-align: center;">
<img src="./kr2r/docs/Picture1.png" alt="Workflow of Kun-peng" style="width: 50%;">
<img src="./docs/Picture1.png" alt="Workflow of Kun-peng" style="width: 50%;">
<p><strong>Fig. 1. Overview of the algorithms of Kun-peng.</strong></p>
</div>

Expand All @@ -19,7 +19,7 @@ We constructed a standard database using the complete RefSeq genomes of archaeal
Kun-peng offers two modes for taxonomy classification: Memory-Efficient Mode (Kun-peng-M) and Full-Speed Mode (Kun-peng-F), with identical classification results. Kun-peng-M matches Kraken2's processing time and uses 57.0 ± 2.25 % of Centrifuge's time (Fig. 2d). However, Kun-peng-M requires only 4.5 ± 1.1 GB peak memory, which is 7.96 ± 0.19 % and 6.31 ± 0.15 % of Kraken2 and Centrifuge's peak memory, respectively (Fig. 2d). Compared to Kraken2, the Kun-peng-F consumes the same memory but requires only of the 67.2 ± 4.57 % processing time. Compared to Centrifuge, Kun-peng-F uses 77.9 ± 0.22 % memory while requiring only 38.8 ± 4.25 % of its processing time (Fig. 2d). Remarkably, with an ultra-low memory requirement, Kun-peng-M can even operate on most personal computers when the standard reference database is used (Fig. 2e).

<div style="text-align: center;">
<img src="./kr2r/docs/Picture2.png" alt="Workflow of Kun-peng" style="width: 50%;">
<img src="./docs/Picture2.png" alt="Workflow of Kun-peng" style="width: 50%;">
<p><strong>Fig. 2. Performance benchmark of Kun-peng against other metagenomic classifiers.</strong></p>
</div>

Expand Down Expand Up @@ -57,6 +57,13 @@ source ~/.bashrc

For macOS users:

### Homebrew
```bash
brew install eric9n/tap/kun_peng
```

### Donwload binary

```bash
# Replace X.Y.Z with the latest version number
VERSION=vX.Y.Z
Expand Down Expand Up @@ -169,7 +176,7 @@ This will build the kr2r and ncbi project in release mode.
Next, run the example script that demonstrates how to use the `kun_peng` binary. Execute the following command from the root of the workspace:

``` sh
cargo run --release --example build_and_classify --package kun_peng
cargo run --release --example build_and_classify
```

This will run the build_and_classify.rs example located in the kr2r project's examples directory.
Expand Down
File renamed without changes
File renamed without changes
File renamed without changes
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,7 @@ use std::process::Command;

fn main() {
// Define the paths and directories
let workspace_root = PathBuf::from(env!("CARGO_MANIFEST_DIR"))
.parent()
.unwrap()
.to_path_buf();
let workspace_root = PathBuf::from(env!("CARGO_MANIFEST_DIR")).to_path_buf();
let kr2r_binary = workspace_root.join("target/release/kun_peng");
let data_dir = workspace_root.join("data");
let test_dir = workspace_root.join("test_database");
Expand Down
40 changes: 0 additions & 40 deletions kr2r/Cargo.toml

This file was deleted.

Loading

0 comments on commit e5bd212

Please sign in to comment.