Skip to content

Embedding-based indexing for compact storage, rapid querying, and curation of bacterial pan-genomes

License

Notifications You must be signed in to change notification settings

pg-space/panspace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embedding bacterial assemblies

The goal is to build an embedding space for bacterial sequences and use them as an index of dense vector to perform fast queries using faiss

CLI panspace

panspace is a library based on tensorflow and faiss index. It provides commands for creating FCGR from kmer counts, train an autoencoder, extract Encoder and Decoder from a trained model, and create and query an Index of embeddings.

git clone git@github.com:pg-space/panspace.git
cd panspace

mamba env create -n panspace -f condaenv.yaml
mamba activate panspace

panspace --help 

Usage: panspace [OPTIONS] COMMAND [ARGS]...                                                                               
                                                                                                                           
 🐱 Welcome to panspace, a tool for Indexing and Querying a pan-genome in an embedding space                               
                                                                                                                           
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.                                                 │
│ --show-completion             Show completion for the current shell, to copy it or customize the installation.          │
│ --help                        Show this message and exit.                                                               │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ data-curation    Find outliers and mislabaled samples.                                                                  │
│ docs             Open documentation webpage.                                                                            │
│ fcgr             Create FCGRs from fasta file or from txt file with kmers and counts.                                   │
│ index            Create and query index. Utilities to test index.                                                       │
│ stats-assembly   N50, number of contigs, avg length, total length.                                                      │
│ trainer          Train Autoencoder/Metric Learning. Utilities.                                                          │
│ utils            Extract info from text or log files                                                                    │
│ what-to-do       🐱 If you are new here, check this step-by-step guide                                                  │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

1. Train Autoencoder and create index

2. Query index


Author

panspace is developed by Jorge Avila

Releases

No releases published

Packages

No packages published