- Bugs in domain splitting with a blank chain ID and Mac multithreading are fixed.
- Python packaging issues are fixed.
- Structures can now be split into domains with Chainsaw before searching, with each domain searched separately. This makes Progres suitable for use with multi-domain structures.
- The whole PDB split into domains with Chainsaw is made available to search against.
- Hetero atoms are now ignored during file reading.
- Example files are added for searching and database embedding.
- The
score
mode is added to calculate the Progres score between two structures.
- Incomplete downloads are handled during setup.
- The environmental variable
PROGRES_DATA_DIR
can be used to change where the downloaded data is stored. - A Docker file is added.
- Searching on GPU is made more memory efficient.
- Bugs when running on Windows are fixed.
- The AlphaFold database TED domains are made available to search against, with FAISS used for fast searching.
- Pre-embedded databases are stored as Float16 to reduce disk usage.
- Datasets and scripts for benchmarking (including for other methods), FAISS index generation and training are made available.
- Change model architecture to use 6 EGNN layers and tau torsion angles, making it faster and SE(3)-invariant rather than E(3)-invariant.
- The AlphaFold models for 21 model organisms are made available to search against.
- The trained model and pre-embedded databases are downloaded from Zenodo rather than GitHub when first running the software.
- Fix data download.
- Add ECOD database.
- Use versioned model directory.
- Add einops dependency.
- Add code for ECOD database.
Initial release of the progres
Python package for fast protein structure searching using structure graph embeddings.