Issues with parallelization via multiprocessing #174

ajlannerd · 2024-07-02T22:30:38Z

ajlannerd
Jul 2, 2024

Hello, thank you for the great package!

I have a large number of 80-atom structures that I am attempting to relax. To try and speed this up I am using the python multiprocessing package to execute multiple runs of StructOptimizer.relax across multiple cpu's (I can run on gpu, but since just using the pre-trained model, I think cpu should be fine). When I try to do this though, I observe a large increase in overall execution time (vs just running sequentially). Is there anything on the backend of CHGNet that doesn't work well with multiprocessing?

I am betting the error is most likely on my end...

Here is a rough outline of the parallelization I am running:

import os
import multiprocessing as multip
from chgnet.model import CHGNet, StructOptimizer


def relax_one_struc(single_struc_info, model):

    struc_name, ini_struc = single_struc_info

    relaxer = StructOptimizer(model=model, use_device="cpu")
    result = relaxer.relax(ini_struc, verbose=True)
    final_struc = result["final_structure"]

    return (struc_name, final_struc.as_dict())


def relax_many_strucs(all_strucs, model):

    with multip.Pool(processes=3) as pool:
        results = pool.starmap(
            relax_one_struc, [(struc_info, model) for struc_info in all_strucs.items()]
        )
    return dict(results)


def main():

    # Load three pymatgen structures in form {struc_name: Structure}

    # first_initial_struc = {struc_1_name, struc_1}
    # second_initial_struc = {struc_2_name, struc_2}
    # third_initial_struc = {struc_3_name, struc_3}
    
    three_strucs = dict(
        (first_initial_struc, second_initial_struc, third_initial_struc)
    )

    pretrained_chgnet = CHGNet.load(use_device="cpu")

    relaxed_strucs = relax_many_strucs(three_strucs, pretrained_chgnet)

    return


if __name__ == "__main__":
    main()

In testing on my laptop, when I run three structures back to back, it takes ~ 100 seconds total (about 30 relaxation steps per structure). However, when running in parallel, the total execution takes ~ 400 seconds. I have run similar tests on a hpc and there is an equivalent slowdown. When running on my laptop's gpu (via mps), the parallelized calculations takes about the same amount of time as the sequential one (which I think makes sense because it is accessing the same compute).

This is long enough, so I will leave it at that. Would greatly appreciate any insight as to why there might be a slow down/ changes and alternative approaches I can take for better results.

Please let me know if I can provide more details.

Thanks!
AJ

Quick edit - forgot to include, but did some time testing, and the relaxation steps are what experience the slower speeds (time for loading the model, initializing classes, overhead for multiprocessing are all minimal)

Answered by janosh

Jul 4, 2024

arguably a more efficient way to parallelize structure relaxation would be to load a single model and then batch the structures in the model's forward pass, making better use of large tensor processing. since different structures need different numbers of relaxation steps, that would require implementing a pool-based ASE calculator that checks if any structures have finished relaxing and swaps those out for new structures from the pool in the next forward pass. let me know if you're interested in working on that, happy to collaborate

View full answer

janosh · 2024-07-04T02:09:56Z

janosh
Jul 4, 2024
Maintainer

arguably a more efficient way to parallelize structure relaxation would be to load a single model and then batch the structures in the model's forward pass, making better use of large tensor processing. since different structures need different numbers of relaxation steps, that would require implementing a pool-based ASE calculator that checks if any structures have finished relaxing and swaps those out for new structures from the pool in the next forward pass. let me know if you're interested in working on that, happy to collaborate

4 replies

ajlannerd Jul 12, 2024
Author

arguably a more efficient way to parallelize structure relaxation would be to load a single model and then batch the structures in the model's forward pass, making better use of large tensor processing. since different structures need different numbers of relaxation steps, that would require implementing a pool-based ASE calculator that checks if any structures have finished relaxing and swaps those out for new structures from the pool in the next forward pass. let me know if you're interested in working on that, happy to collaborate

Hi Janosh, I think we would be interested in trying to do something along these lines if you would be willing to give some guidance. This is my first time working with ASE directly. Looking through the CHGNet codebase, passing a batch of Atoms/ Structures/ CrystalGraphs to CHGNet looks like it will be relatively straightforward to handle. I think handling the structure swapping could potentially be straightforward. However, my main confusion comes from the optimizer. To my best understanding the optimizer takes in an Atoms object, a Calculator object, and an Observer object. Using by default the FIRE structure optimization algorithm, it will get the forces of the input structure from CHGNet, use that to adjust atomic positions, and rinse and repeat until all force are below the specified threshold. Looking at the Optimizer class in ASE it doesn't seem readily amenable to processing multiple Atoms objects at once. Do you think we will need to write our own optimizer class? Is there a way to handle all of this in the pool-based ASE calculator that would be more straightforward? Appreciate any feedback/ suggestions!

janosh Jul 12, 2024
Maintainer

Looking at the Optimizer class in ASE it doesn't seem readily amenable to processing multiple Atoms objects at once. Do you think we will need to write our own optimizer class? Is there a way to handle all of this in the pool-based ASE calculator that would be more straightforward? Appreciate any feedback/ suggestions!

yes, you nailed it. that's exactly what needs to happen. not sure if it should be a new method on an existing optimizer optimizer or a new optimizer class. probably a new class, like you say, as i assume a lot of the method signatures would have to change on the existing optimizer to accommodate passing multiple Atoms.

Is there a way to handle all of this in the pool-based ASE calculator that would be more straightforward

no i think you're right, the pool (or queue or whatever to call it) functionality should be on the Optimizer

ajlannerd Jul 12, 2024
Author

Alright, awesome, I will try and work on it in the next week and respond here with any progress/ road blocks. Thanks!

janosh Jul 12, 2024
Maintainer

cool! full disclosure we are planning to work on this as well, potentially next week if time permits. happy to chat and compare notes if we each make an attempt

mstapelberg · 2024-08-14T15:45:47Z

mstapelberg
Aug 14, 2024

Hello, hope you both are doing well! Has there been progress on this? I'd be interested in this functionality in ASE with CHGNet (and other PyTorch based MLIPs) and can offer some assistance if needed as I'm trying to do many NEB calculations split over several GPUs.

At the moment sequentially doing NEB takes me about 1-2 days for 10000 barriers...

Thanks!
Myles

2 replies

janosh Aug 14, 2024
Maintainer

yes, we have a BatchCalculator that works well with MACE and 7Net. haven't tried it with CHGNet yet. i'm not at liberty to share it without prior discussion. we'll definitely consider releasing it once fully tested

mstapelberg Aug 14, 2024

Oh great news! Understood; I'm also happy to be a tester if that would be helpful. I'm currently comparing CHGNet, Allegro, and MACE for my work. If not, no worries!

Thanks again!
Myles

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with parallelization via multiprocessing #174

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Issues with parallelization via multiprocessing #174

ajlannerd Jul 2, 2024

Replies: 2 comments · 6 replies

janosh Jul 4, 2024 Maintainer

ajlannerd Jul 12, 2024 Author

janosh Jul 12, 2024 Maintainer

ajlannerd Jul 12, 2024 Author

janosh Jul 12, 2024 Maintainer

mstapelberg Aug 14, 2024

janosh Aug 14, 2024 Maintainer

mstapelberg Aug 14, 2024

ajlannerd
Jul 2, 2024

Replies: 2 comments 6 replies

janosh
Jul 4, 2024
Maintainer

ajlannerd Jul 12, 2024
Author

janosh Jul 12, 2024
Maintainer

ajlannerd Jul 12, 2024
Author

janosh Jul 12, 2024
Maintainer

mstapelberg
Aug 14, 2024

janosh Aug 14, 2024
Maintainer