Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Move to new calculator API #517

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Conversation

SamTov
Copy link
Member

@SamTov SamTov commented Apr 13, 2022

Purpose

The purpose of this PR is to move the calculator all to one call that constructs a calculator object and another that executes it in the following way:

my_calc = EinsteinDiffusionCoefficients(species=["Na"], data_range=50)
project.experiments.NaCl_example_data.execute_operation(my_calc)

This currently works in this branch for the Einstein diffusion coefficients ONLY.

Please read the full PR before commenting.

What is broken?

Currently it still uses the call decorator and is able to check if data is already stored, the only thing that is broken here is that you cannot call this from the project at the moment and have it loop over experiments. This is because at the moment of the call the list of experiments is not parsed to the calculator, this can, however, be put into the experiment class. This will be part of a broader restructuring of the calculators assumedly.

Comments

I would propose that we move to using call methods on operations that can be called in this way. I added the method to the experiment class:

    def execute_operation(self, operation: callable):
        operation.experiment = self
        operation()

In this way, it doesn't matter if the operation is a calculator, transformation, or graph operation, it can be called by the same method.

Work program

I was thinking of going about this in the following way:

  • Decide on call API
  • Move all calculators to call method and put their attributes into their inits
  • Add functionality to call this from the Project class

If this is complete, we can merge this into main with now changes to the functionality of the package just with the new API in which one has to define the calculator class before running it.

Once this is merged, I would suggest then moving the data handling of the calculators into the experiment and changing the database in one shot.

Comments

  • All naming and methods at this stage are for demonstration purposes only.
  • The examples are likely broken due to API changes
  • An example of the new API is in the NaCl_walkthrough notebook

Updates

  • Resolved the calling from the project class with essentially a mirroring method in the project class:
    def execute_operation(self, operation):
        for item in self.active_experiments.values():
            item.execute_operation(operation)

This however does not check if the calculation was already run for some reason.

  • Removed the call method necessity and enabled checking of sql database for previous data saving.

@christophlohrmann
Copy link
Collaborator

I like the approach of changing the API first and then caring about the database.
As long as the "I did this already, will load instead of repeating the calculation" still works, it justifies using the __call__ of the calculator.
In the future we can probably go away from this by just calling a common function of the calculators inside execute_operation.
At that point the calculator has all the information it needs ("static info" like data_range from its init and "dynamic info" like species from the arguments passed by the experiment) and can decide if it should run or load.

@SamTov
Copy link
Member Author

SamTov commented Apr 22, 2022

I like the approach of changing the API first and then caring about the database. As long as the "I did this already, will load instead of repeating the calculation" still works, it justifies using the __call__ of the calculator. In the future we can probably go away from this by just calling a common function of the calculators inside execute_operation. At that point the calculator has all the information it needs ("static info" like data_range from its init and "dynamic info" like species from the arguments passed by the experiment) and can decide if it should run or load.

The issue currently is an odd one. When you call this new api directly from an experiment it checks whether it has already been run, however, if you call it from the project, it does not. I haven't tried to find out why but it is the only issue preventing full adoption at this stage.

@Fratorhe
Copy link
Collaborator

This is a good idea.
I feel it improves the decoupling of the code, such that in principle, you could use the calculator/transformation anywhere else.
This will also make it easier for testing.

@PythonFZ
Copy link
Member

I tried to visualize our approach here a bit. Is this what we are after?

flowchart TD

calc[Calculator] -->|pass user arguments| B(Calculator Instance)

B --> |exp.execute_operation| exp[Experiment]

exp --> db{Check DB}
db --> |exist| res[Show results]

db --> |new, pass exp args| D[run calculation]
B --> D
D -->|store in db| exp
Loading

I'm still struggeling a bit with this:

my_calc = EinsteinDiffusionCoefficients(species=["Na"], data_range=50)
project.experiments.NaCl_example_data.execute_operation(my_calc)

In my mind I would pass the data to the calculator, and not the calculator to the data.
But I know, we can not nicely do this, because the calculator writes to the experiment database.
But If it did it could simplify things:

flowchart TD

calc[Calculator] -->|pass user arguments| B(Calculator Instance)
B --> prep[Prepare Calculation]
exp[Experiment Data + DB handle] --> prep

prep --> db{check DB}
db --> yes[return results]
db --> no[run computation, save in db]
no --> yes
Loading

I do prefer having the experiment or rather the project handle all the database stuff though and not the calculators.
Ideally the calculator just takes some dataset and returns a dataset.

@Fratorhe
Copy link
Collaborator

I agree with this sentence: "Ideally the calculator just takes some dataset and returns a dataset."
The second sketch does indeed look more linear and less coupled, which should be more beneficial in the long term for maintenance.
However, it is unclear to me who would perform the "Prepare calculation" and "check DB" steps in this case.

@SamTov
Copy link
Member Author

SamTov commented Apr 25, 2022

So the goal will be the following:

  1. instantiate calculator
  2. Call experiment.something_nice(calculator, **kwargs)
  3. Experiment loads data required in memory safe manner or checks if it exists
  4. Passes data to calculator
  5. Calculator returns data
  6. Experiment puts data into SQL and can return if necessary

The kwargs in this case could be for a post-rdf calculator where you pass RDF data or an MSD. In this way, you have flexibility to pass data but it is all handled by the experiment which provides a linear flow of data and communication between databases.

@SamTov
Copy link
Member Author

SamTov commented Jun 21, 2022

I have a very odd bug at this stage. It seems that when I run the calculator, it does not load the data, but when I run it again, it finds the data in the database and returns it... So on the run where we compute it, it cannot find the data but then the second time I run it, it finds it in the database.

This is resolved

# Conflicts:
#	examples/notebooks/Molten_Salt_Comparison.ipynb
#	mdsuite/calculators/coordination_number_calculation.py
#	mdsuite/calculators/einstein_diffusion_coefficients.py
#	mdsuite/calculators/green_kubo_distinct_diffusion_coefficients.py
#	mdsuite/calculators/green_kubo_ionic_conductivity.py
#	mdsuite/calculators/green_kubo_self_diffusion_coefficients.py
#	mdsuite/calculators/potential_of_mean_force.py
#	mdsuite/calculators/structure_factor.py
…ware/MDSuite into SamTov_Calculator_data_move

# Conflicts:
#	CI/integration_tests/calculators/test_kirkwood_buff_integrals.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants