Add example CSV file

Meant to rename it earlier, accidentally just removed it.
gcorso · Apr 9, 2024 · e259429 · e259429
1 parent 6f5d4b1
commit e259429
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 4 deletions.
diff --git a/.gitignore b/.gitignore
@@ -125,10 +125,10 @@ local_config_inference2.yml
 .p.npy
 .score.npy
 # this ignores everything in data except for the file
-!/data
 /data/*
+!/data
 !/data/splits
-!/data/protein_ligand_example_csv.csv
+!/data/protein_ligand_example*
 !/data/testset_csv.csv
 !/data/INDEX_general_PL_data.2020
 test_run

diff --git a/README.md b/README.md
@@ -87,11 +87,11 @@ The protein inputs need to be `.pdb` files or sequences that will be folded with
 For a single complex: specify the protein with `--protein_path protein.pdb` or `--protein_sequence GIQSYCTPPYSVLQDPPQPVV` and the ligand with `--ligand ligand.sdf` or `--ligand "COc(cc1)ccc1C#N"`
 
 For many complexes: create a csv file with paths to proteins and ligand files or SMILES. It contains as columns `complex_name` (name used to save predictions, can be left empty), `protein_path` (path to `.pdb` file, if empty uses sequence), `ligand_description` (SMILE or file path)  and `protein_sequence` (to fold with ESMFold in case the protein_path is empty).
-An example .csv is at `data/protein_ligand_example_csv.csv` and you would use it with `--protein_ligand_csv protein_ligand_example_csv.csv`.
+An example .csv is at `data/protein_ligand_example.csv` and you would use it with `--protein_ligand_csv protein_ligand_example.csv`.
 
 And you are ready to run inference:
 
-    python -m inference --config default_inference_args.yaml  --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small 
+    python -m inference --config default_inference_args.yaml  --protein_ligand_csv data/protein_ligand_example.csv --out_dir results/user_predictions_small 
 
 When providing the `.pdb` files you can run DiffDock also on CPU, however, if possible, we recommend using a GPU as the model runs significantly faster. Note that the first time you run DiffDock on a device the program will precompute and store in cache look-up tables for SO(2) and SO(3) distributions (typically takes a couple of minutes), this won't be repeated in following runs.  
 

diff --git a/data/protein_ligand_example.csv b/data/protein_ligand_example.csv
@@ -0,0 +1,3 @@
+complex_name,protein_path,ligand_description,protein_sequence
+1a0q,data/1a0q/1a0q_protein_processed.pdb,data/1a0q/1a0q_ligand.sdf,
+1a0q_custom,data/1a0q/1a0q_protein_processed.pdb,COc(cc1)ccc1C#N,