Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils restructured in subfolders and removed unused imports #275

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

allaffa
Copy link
Collaborator

@allaffa allaffa commented Aug 19, 2024

No description provided.

@allaffa allaffa added the enhancement New feature or request label Aug 19, 2024
@allaffa allaffa self-assigned this Aug 19, 2024
@allaffa
Copy link
Collaborator Author

allaffa commented Aug 19, 2024

@pzhanggit @jychoi-hpc
I think I restructured all the imports.
Since this is a big PR, please let us take time to review it.
I do not want to merge it before September 30.
However, I would like to make gradual changes based on your feedback during this time.

@allaffa
Copy link
Collaborator Author

allaffa commented Aug 22, 2024

@RylieWeaver @AdithyaRaman @erdemcaliskan

Please let me know if you need help on learning how to review a PR on GitHub.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small inconsistency - this file is print/print_utils.py whereas for some files we don't have utils in the file name e.g., model/model.py, optimizer/optimizer.py.

Copy link
Collaborator Author

@allaffa allaffa Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erdemcaliskan

I cannot rename print_utils just as print because PyCharm returns a bunch of errors during the refactoring that are similar to the following one:

The name 'print' is not a valid Python identifier. Cannot update an import statement in 'stratified_sampling.py'

Do you have any alternative name to suggest? Otherwise, I will keep is as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense that it wouldn't accept. Now that I checked we also have time_utils and smiles_utils. They can stay as they are or maybe printing and timing can be used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us keep the renaming as is.
We spent already enough time on renaming discussions.
I think we should move forward to other things.

Copy link
Collaborator

@erdemcaliskan erdemcaliskan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, we have a utils folder in the main and one more in hydragnn/utils which is confusing.

@@ -0,0 +1,19 @@
from .abstractbasedataset import AbstractBaseDataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues here but everything is repeated since they have the filename import class name which are mostly the same. A small suggestion - this would require a lot of refactoring but using importlib may yield cleaner code here:

import importlib

# List of datasets and their corresponding classes
datasets = [
    ("abstractbasedataset", "AbstractBaseDataset"),
    ("abstractrawdataset", "AbstractRawDataset"),
    ("adiosdataset", ["AdiosDataset", "AdiosWriter"]),
    ("cfgdataset", "CFGDataset"),
    ("distdataset", "DistDataset"),
    ("lsmsdataset", "LSMSDataset"),
    ("pickledataset", ["SimplePickleDataset", "SimplePickleWriter"]),
    ("serializeddataset", ["SerializedDataset", "SerializedWriter"]),
    ("xyzdataset", "XYZDataset"),
]

# Import classes dynamically
for module_name, class_names in datasets:
    module = importlib.import_module(f".{module_name}", package=__name__)
    if isinstance(class_names, list):
        for class_name in class_names:
            globals()[class_name] = getattr(module, class_name)
    else:
        globals()[class_names] = getattr(module, class_names)

# Import functions from compositional_data_splitting
from .compositional_data_splitting import (
    get_keys,
    get_elements_list,
    get_max_graph_size,
    create_dictionary_from_elements_list,
    create_dataset_categories,
    duplicate_unique_data_samples,
    generate_partition,
    compositional_stratified_splitting,
)

So now we can import them in diffent files as shown below:

from datasets import AdiosDataset

adios_dataset = AdiosDataset()

Copy link
Collaborator Author

@allaffa allaffa Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erdemcaliskan
This seems a nice restructuring, but as you said, it also seem to require a substantial restructuring which may deserve a standalone PR to be implemented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 30 should be from hydragnn.utils.datasets.adiosdataset import AdiosWriter, AdiosDataset

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I installed adios2 from conda forge and tried running this on my local but couldn't run it due to below errors:

python train_ising.py
[2024-08-22 12:08:46,496] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2024-08-22 12:08:46,496] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
Distributed data parallel: gloo master at 127.0.0.1:8889
0: Command: train_ising.py

INFO (rank 0): Adios load
0: Adios reading: /home/ecalisk1/libs/git/HydraGNN/examples/ising_model/./datasets/ising_model_3_1000.bp
Traceback (most recent call last):
File "/home/ecalisk1/libs/git/HydraGNN/examples/ising_model/train_ising.py", line 319, in
trainset = AdiosDataset(fname, "trainset", comm, **opt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ecalisk1/libs/git/HydraGNN/hydragnn/utils/datasets/adiosdataset.py", line 382, in init
with ad2.open(self.filename, "r", self.comm) as f:
^^^^^^^^
AttributeError: module 'adios2' has no attribute 'open'
Exception ignored in: <function AdiosDataset.del at 0x7fa9b20ec220>
Traceback (most recent call last):
File "/home/ecalisk1/libs/git/HydraGNN/hydragnn/utils/datasets/adiosdataset.py", line 733, in del
self.f.close()
^^^^^^
AttributeError: 'AdiosDataset' object has no attribute 'f'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jychoi-hpc
I remember encountering this error before.
May you confirm that this is due to the version of ADIOS2 installed?

examples/qm7x/train.py Outdated Show resolved Hide resolved
examples/ogb/train_gap.py Outdated Show resolved Hide resolved
@allaffa
Copy link
Collaborator Author

allaffa commented Sep 3, 2024

Finally, we have a utils folder in the main and one more in hydragnn/utils which is confusing.

@erdemcaliskan
I moved the lsms utils into the hydragnn.utils folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants