Skip to content

Commit

Permalink
update docstring
Browse files Browse the repository at this point in the history
  • Loading branch information
kjappelbaum committed Feb 16, 2024
1 parent 60bdba3 commit f052a2b
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion data/postprocess_split.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
It also merges files that have been created by `dask` if they are chunks of one large dataset.
This script needs to be run after the splitting script.
An independent check (that does not rewrite files is `check_smiles_split.py` this checks also for compliance with the predetermined files)
"""
import os
from glob import glob
Expand Down Expand Up @@ -153,7 +155,7 @@ def process_file(file: Union[str, Path], id_cols):
for id in id_cols:
test_smiles.extend(df[df["split"] == "test"][id].to_list())
val_smiles.extend(df[df["split"] == "valid"][id].to_list())

df.drop_duplicates(subset=[id], inplace=True)
test_smiles = set(test_smiles)
val_smiles = set(val_smiles)

Expand Down

0 comments on commit f052a2b

Please sign in to comment.