The SWORD2 partitioning algorithm produces multiple alternative domain assignments for a given protein structure. This unique approach handles ambiguous protein structure partitioning, admitting several solutions. The decomposition of the protein structure into domains is achieved through the hierarchical clustering of Protein Units, evolutionarily preserved structural descriptors at the interface between secondary structures and domains.
This is the repository of the standalone version of the corresponding webserver:
dsimb.inserm.fr/SWORD2
(Easy & recommanded)
Install the conda environment using the environment.yml
file:
conda env create -f environment.yml
or
mamba env create -f environment.yml
Compile necessary dependencies:
bash install.sh
Otherwise, a docker is also available:
docker pull dsimb/sword2:latest
# This will mount the local `results` directory inside the `/output` Docker image directory and run SWORD2 as you (local user)
# so that the results files are owned by you and not root.
(sudo) docker run --rm -e USER_ID=$(id -u) -e GROUP_ID=$(id -g) -v $(pwd)/results:/output dsimb/sword2:latest -p 1jx4 -o /output
First, activate the working environment:
conda activate sword2
Then, launch SWORD2:
./SWORD2.py -p 1jx4 -o results
./SWORD2.py -u Q76EI6 -o results
./SWORD2.py -m MGYP000936678158 -o results
./SWORD2.py -i ./structure.pdb -o results
./SWORD2.py -i ./structure.pdb -d 2 -o results
./SWORD2.py -p 1jx4 -o results --disable-energy --disable-plots
SWORD2 will have a peak memory usage of 150-200 Mb for average protein sizes of 100-300 residues. The largest proteins, for example 2700 residues can take up to 1.5 Gb memory.
To get the full help:
$ ./SWORD2.py --help
usage: SWORD2.py [-h] [--version] (-u UNIPROT_ID | -m MGNIFY_ID | -p PDB_ID | -i INPUT_FILE) [-c PDB_CHAIN] [-d MODEL] [-x CPU] [-e] [-l] -o OUTPUT
SWORD2: SWift and Optimized Recognition of protein Domains.
The SWORD2 partitioning algorithm produces multiple alternative
domain assignments for a given protein structure.
This unique approach handles ambiguous protein structure partitioning,
admitting several solutions. The decomposition of the protein structure
into domains is achieved through the hierarchical clustering of Protein Units,
evolutionarily preserved structural descriptors at the interface between
secondary structures and domains.
options:
-h, --help show this help message and exit
--version show program's version number and exit
-u UNIPROT_ID, --uniprot-id UNIPROT_ID
AlphaFold Uniprot Accession Id.
-m MGNIFY_ID, --mgnify-id MGNIFY_ID
MGnify Id for the ESM Metagenomic Atlas.
-p PDB_ID, --pdb-id PDB_ID
PDB id to download from the PDB database.
-i INPUT_FILE, --input-file INPUT_FILE
Path to an input PDB or mmCIF file.
optional arguments:
-c PDB_CHAIN, --pdb-chain PDB_CHAIN
PDB chain. If not specified, the first chain in the PDB file will be used.
-d MODEL, --model MODEL
Model to parse. Especially useful for NMR files which contain several models. Default is 1.
-x CPU, --cpu CPU Number of CPUs to use. Default all (0). Max on this computer is: 10
-e, --disable-energies
Disable the calculation of pseudo-energy of domains and PUs.
-l, --disable-plots Disable the generation of contact probability matrices plots.
required arguments:
-o OUTPUT, --output OUTPUT
Output directory. Results will be generated inside in a dedicated directory named after OUTPUT/PDBID_CHAIN/
Example:
$ conda activate sword2
(sword2) $ ./SWORD2.py -p 1jx4 -o results
2024/09/19 11:14:27 INFO Fetch PDB ID: 1jx4
2024/09/19 11:14:28 INFO No chain specified. Using first chain 'A' in the PDB file.
2024/09/19 11:14:28 INFO
2024/09/19 11:14:28 INFO >>> 1jx4_A (335 aa)
2024/09/19 11:14:28 INFO >>> Estimated runtime: 1 minutes and 6 seconds
2024/09/19 11:14:28 INFO >>> Using 10 cpus
2024/09/19 11:14:28 INFO
2024/09/19 11:14:28 INFO Write a clean version of the PDB: remove non standard residues
2024/09/19 11:14:28 INFO Launch SWORD
2024/09/19 11:14:34 INFO Parse SWORD output
2024/09/19 11:14:34 INFO Calculate pseudo-energies of Domains
2024/09/19 11:15:27 INFO Write the SWORD results
2024/09/19 11:15:27 INFO Write Peeling results
2024/09/19 11:15:48 INFO Generate histogram of SWORD2 domains consistency
2024/09/19 11:15:49 INFO Generate contact probability matrices
2024/09/19 11:15:54 INFO Calculate junctions consistencies
2024/09/19 11:15:54 INFO Clean and prepare results
2024/09/19 11:15:54 INFO Results can be found here: results/1jx4_A
2024/09/19 11:15:54 INFO Total runtime: 87 seconds
An easily parseable output in JSON format is generated for easier downstream tasks/analysis: SWORD2_summary.json
Users can now run multiple SWORD2 jobs in parallel. Here is an example script:
import subprocess
from concurrent.futures import ThreadPoolExecutor
# List of 5 PDB codes
pdb_list = ['1jx4', '2c78', '1f5n', '1a8y', '1b89']
# Function to run a SWORD2 job
def run_sword2(pdb_code):
command = f'./SWORD2.py -p {pdb_code} -o results'
try:
subprocess.run(command, shell=True, check=True)
print(f"Job for {pdb_code} completed successfully.")
except subprocess.CalledProcessError as e:
print(f"Error running job for {pdb_code}: {e}")
# Run 5 jobs using ThreadPoolExecutor
def run_sword2_jobs():
with ThreadPoolExecutor(max_workers=5) as executor: # Adjust max_workers based on your system's capabilities
executor.map(run_sword2, pdb_list)
if __name__ == "__main__":
run_sword2_jobs()