-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Move to new calculator API #517
base: main
Are you sure you want to change the base?
Conversation
I like the approach of changing the API first and then caring about the database. |
The issue currently is an odd one. When you call this new api directly from an experiment it checks whether it has already been run, however, if you call it from the project, it does not. I haven't tried to find out why but it is the only issue preventing full adoption at this stage. |
This is a good idea. |
I tried to visualize our approach here a bit. Is this what we are after? flowchart TD
calc[Calculator] -->|pass user arguments| B(Calculator Instance)
B --> |exp.execute_operation| exp[Experiment]
exp --> db{Check DB}
db --> |exist| res[Show results]
db --> |new, pass exp args| D[run calculation]
B --> D
D -->|store in db| exp
I'm still struggeling a bit with this: my_calc = EinsteinDiffusionCoefficients(species=["Na"], data_range=50)
project.experiments.NaCl_example_data.execute_operation(my_calc) In my mind I would pass the data to the calculator, and not the calculator to the data. flowchart TD
calc[Calculator] -->|pass user arguments| B(Calculator Instance)
B --> prep[Prepare Calculation]
exp[Experiment Data + DB handle] --> prep
prep --> db{check DB}
db --> yes[return results]
db --> no[run computation, save in db]
no --> yes
I do prefer having the experiment or rather the project handle all the database stuff though and not the calculators. |
I agree with this sentence: "Ideally the calculator just takes some dataset and returns a dataset." |
So the goal will be the following:
The kwargs in this case could be for a post-rdf calculator where you pass RDF data or an MSD. In this way, you have flexibility to pass data but it is all handled by the experiment which provides a linear flow of data and communication between databases. |
This is resolved |
…o SamTov_Calculator_data_move # Conflicts: # examples/notebooks/NaCl_walkthrough.ipynb
…culator_data_move
# Conflicts: # examples/notebooks/Molten_Salt_Comparison.ipynb # mdsuite/calculators/coordination_number_calculation.py # mdsuite/calculators/einstein_diffusion_coefficients.py # mdsuite/calculators/green_kubo_distinct_diffusion_coefficients.py # mdsuite/calculators/green_kubo_ionic_conductivity.py # mdsuite/calculators/green_kubo_self_diffusion_coefficients.py # mdsuite/calculators/potential_of_mean_force.py # mdsuite/calculators/structure_factor.py
…ware/MDSuite into SamTov_Calculator_data_move # Conflicts: # CI/integration_tests/calculators/test_kirkwood_buff_integrals.py
Purpose
The purpose of this PR is to move the calculator all to one call that constructs a calculator object and another that executes it in the following way:
This currently works in this branch for the Einstein diffusion coefficients ONLY.
Please read the full PR before commenting.
What is broken?
Currently it still uses the call decorator and is able to check if data is already stored, the only thing that is broken here is that you cannot call this from the project at the moment and have it loop over experiments. This is because at the moment of the call the list of experiments is not parsed to the calculator, this can, however, be put into the experiment class. This will be part of a broader restructuring of the calculators assumedly.Comments
I would propose that we move to using call methods on operations that can be called in this way. I added the method to the experiment class:
In this way, it doesn't matter if the operation is a calculator, transformation, or graph operation, it can be called by the same method.
Work program
I was thinking of going about this in the following way:
If this is complete, we can merge this into main with now changes to the functionality of the package just with the new API in which one has to define the calculator class before running it.
Once this is merged, I would suggest then moving the data handling of the calculators into the experiment and changing the database in one shot.
Comments
Updates
This however does not check if the calculation was already run for some reason.