Skip to content

Commit

Permalink
Merge pull request #9 from wehs7661/remove_gmxapi
Browse files Browse the repository at this point in the history
Remove the use of gmxapi
  • Loading branch information
wehs7661 authored May 1, 2023
2 parents c71ad05 + 156d70b commit 239776a
Show file tree
Hide file tree
Showing 12 changed files with 222 additions and 372 deletions.
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,12 @@ jobs:
- run:
name: Install the ensemble_md package
command: |
export gmxapi_ROOT=$HOME/pkgs # set the envrionment variable so gmxapi can be installed successfully
python3 -m pip install '.[gmxapi]'
python3 -m pip install .
- run:
name: Run unit tests
command: |
source $HOME/pkgs/bin/GMXRC
pip3 install pytest
pip3 install pytest-cov
pytest -vv --disable-pytest-warnings --cov=ensemble_md --cov-report=xml --color=yes ensemble_md/tests/
Expand Down
15 changes: 0 additions & 15 deletions .codecov.yml

This file was deleted.

12 changes: 0 additions & 12 deletions .lgtm.yml

This file was deleted.

2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,4 +176,4 @@
# autoclass_content = 'both'
autodoc_member_order = 'bysource'
napoleon_attr_annotations = True
autodoc_mock_imports = ["mpi4py", "gmxapi"]
autodoc_mock_imports = ["mpi4py"] # we originally included gmxapi in the old versions of ensemble_md
17 changes: 3 additions & 14 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,19 @@
running, and analyzing GROMACS simulation ensembles. The current implementation is
mainly for synchronous ensemble of expanded ensemble (EEXE), but we will develop
methods like asynchronous EEXE, or ensemble of alchemical metadynamics in the future.
In the current implementation, `gmxapi`_, which is a higher level Python API of GROMACS,
In the current implementation, the module :code:`subprocess`
is used to launch GROMACS commands, but we will switch to `SCALE-MS`_ for this purpose
in the future when possible.


.. _`gmxapi`: https://manual.gromacs.org/current/gmxapi/
.. _`SCALE-MS`: https://scale-ms.readthedocs.io/en/latest/


2. Installation
===============
2.1. Requirements
-----------------
Before installing :code:`ensemble_md`, one should have working versions of `GROMACS`_
and `gmxapi`_. Please refer to the linked documentations for full installation instructions.
Before installing :code:`ensemble_md`, one should have working versions of `GROMACS`_. Please refer to the linked documentations for full installation instructions.
All the other pip-installable dependencies of :code:`ensemble_md` (specified in :code:`setup.py` of the package)
will be automatically installed during the installation of the package.

Expand All @@ -31,14 +29,6 @@ will be automatically installed during the installation of the package.

pip install ensemble-md

By default, the command above does not install :code:`gmxapi`, so one needs to either
following the full installation instruction of :code:`gmxapi`, or install
:code:`gmxapi` along with the package (after sourcing the GROMACS excutable, e.g.
:code:`/usr/local/gromacs/bin/GMXRC`) with the following command:
::

pip install ensemble-md[gmxapi]

2.3. Installation from source
-----------------------------
One can also install :code:`ensemble_md` from the source code, which is available in our
Expand All @@ -49,8 +39,7 @@ One can also install :code:`ensemble_md` from the source code, which is availabl
cd ensemble_md/
pip install .

To install the pacakg along with :code:`gmxapi`, replace the last command with
:code:`pip install '.[gmxapi]'`. If you are interested in contributing to the project, append the
If you are interested in contributing to the project, append the
last command with the flag :code:`-e` to install the project in the editable mode
so that changes you make in the source code will take effects without re-installation of the package.
(Pull requests to the project repository are welcome!)
Expand Down
63 changes: 44 additions & 19 deletions docs/simulations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@
===============================
:code:`ensemble_md` provides three command-line interfaces (CLI), including :code:`explore_EEXE`, :code:`run_EEXE` and :code:`analyze_EEXE`.
:code:`explore_EEXE` helps the user to figure out possible combinations of EEXE parameters, while :code:`run_EEXE` and :code:`analyze_EEXE`
can be used to perform and analyze EEXE simulations, respectively. Here is the help message of :code:`explore_EEXE`:
can be used to perform and analyze EEXE simulations, respectively. Below we provide more details about each of these CLIs.

1.1. CLI `explore_EEXE`
-----------------------
Here is the help message of :code:`explore_EEXE`:

::

Expand All @@ -25,7 +29,9 @@ can be used to perform and analyze EEXE simulations, respectively. Here is the h
replicas.


And here is the help message of :code:`run_EEXE`:
1.2. CLI `run_EEXE`
-------------------
Here is the help message of :code:`run_EEXE`:

::

Expand All @@ -52,6 +58,18 @@ And here is the help message of :code:`run_EEXE`:
The maximum number of warnings in parameter specification to be
ignored.

In our current implementation, it is assumed that all replicas of an EEXE simulations are performed in
parallel using MPI. Naturally, performing an EEXE simulation using :code:`run_EEXE` requires a command-line interface
to launch MPI processes, such as :code:`mpirun` or :code:`mpiexec`. For example, on a 128-core node
in a cluster, one may use :code:`mpirun -np 4 run_EEXE` (or :code:`mpiexec -n 4 run_EEXE`) to run an EEXE simulation composed of 4
replicas with 4 MPI processes. Note that in this case, it is often recommended to explicitly specify
more details about resources allocated for each replica. For example, one can specifies :code:`{'-nt': 32}`
for the EEXE parameter `runtime_args` (specified in the input YAML file, see :ref:`doc_EEXE_parameters`),
so each of the 4 replicas will use 32 threads (assuming thread-MPI GROMACS), taking the full advantage
of 128 cores.

1.3. CLI `analyze_EEXE`
-----------------------
Finally, here is the help message of :code:`analyze_EEXE`:

::
Expand Down Expand Up @@ -119,11 +137,9 @@ other during the simulation ensemble. Check :ref:`doc_parameters` for more detai

Step 2: Run the 1st iteration
-----------------------------
With all the input files/parameters set up in the previous run, one can use :obj:`.run_EEXE` to run the
first iteration. Specifically, :obj:`.run_EEXE` uses :code:`gmxapi.commandline_operation` to launch an GROMACS
:code:`grompp` command to generate the input MDP file. Then, if :code:`parallel` is specified as :code:`True`
in the input YAML file, :code:`gmxapi.mdrun` will be used to run GROMACS :code:`mdrun` commands in parallel,
otherwise :code:`gmxapi.commandline_operation` will be used to run simulations serially.
With all the input files/parameters set up in the previous run, one can use run the first iteration,
using :obj:`.run_EEXE`, which uses :code:`subprocess.run` to launch GROMACS :code:`grompp`
and :code:`mdrun` commands in parallel.

Step 3: Set up the new iteration
--------------------------------
Expand Down Expand Up @@ -194,7 +210,15 @@ In the current implementation of the algorithm, 22 parameters can be specified i
Note that the two CLIs :code:`run_EEXE` and :code:`analyze_EEXE` share the same input YAML file, so we also
include parameters for data analysis here.

3.1. Simulation inputs
3.1. GROMACS executable
-----------------------

- :code:`gmx_executable`: (Required)
The GROMACS executable to be used to run the EEXE simulation. The value could be as simple as :code:`gmx`
or :code:`gmx_mpi` if the exeutable has be sourced. Otherwise, the full path of the exetuable (e.g.
:code:`/usr/local/gromacs/bin/gmx`, the path returned by the command :code:`which gmx`).

3.2. Simulation inputs
----------------------

- :code:`gro`: (Required)
Expand All @@ -204,11 +228,11 @@ include parameters for data analysis here.
- :code:`mdp`: (Required)
The MDP template that has the whole range of :math:`λ` values.

3.2. EEXE parameters
.. _doc_EEXE_parameters:

3.3. EEXE parameters
--------------------

- :code:`parallel`: (Required)
Whether the replicas of EEXE should be run in parallel or not.
- :code:`n_sim`: (Required)
The number of replica simulations.
- :code:`n_iter`: (Required)
Expand Down Expand Up @@ -241,7 +265,7 @@ include parameters for data analysis here.
Additional runtime arguments to be appended to the GROMACS :code:`mdrun` command provided in a dictionary.
For example, one could have :code:`{'-nt': 16}` to run the simulation using 16 threads.

3.3. Output settings
3.4. Output settings
--------------------
- :code:`verbose`: (Optional, Default: :code:`True`)
Whether a verbse log is wanted.
Expand All @@ -250,7 +274,7 @@ include parameters for data analysis here.

.. _doc_analysis_params:

3.4. Data analysis
3.5. Data analysis
------------------
- :code:`msm`: (Optional, Default: :code:`False`)
Whether to build Markov state models (MSMs) for the EEXE simulation and perform relevant analysis.
Expand All @@ -271,20 +295,21 @@ include parameters for data analysis here.
- :code:`seed`: (Optional, Default: None)
The random seed to use in bootstrapping.

3.5. A template input YAML file
3.6. A template input YAML file
-------------------------------
For convenience, here is a template of the input YAML file, with each optional parameter specified with the default and required
parameters left with a blank. Note that specifying :code:`null` is the same as leaving the parameter unspecified (i.e. :code:`None`).

::
# Section 1: GROMACS executable
gmx_executable:

# Section 1: Simulation inputs
# Section 2: Simulation inputs
gro:
top:
mdp:

# Section 2: EEXE parameters
parallel:
# Section 3: EEXE parameters
n_sim:
n_iter:
s:
Expand All @@ -297,11 +322,11 @@ parameters left with a blank. Note that specifying :code:`null` is the same as l
grompp_args: null
runtime_args: null

# Section 3: Output settings
# Section 4: Output settings
verbose: True
n_ckpt: 100

# Section 4: Data analysis
# Section 5: Data analysis
msm: False
free_energy: False
df_spacing: 1
Expand Down
61 changes: 19 additions & 42 deletions ensemble_md/cli/run_EEXE.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
####################################################################
import os
import sys
import glob
import time
import copy
import shutil
Expand Down Expand Up @@ -90,6 +89,8 @@ def main():

# Step 2: If there is no checkpoint file found/provided, perform the 1st iteration (index 0)
if os.path.isfile(args.ckpt) is False:
start_idx = 1

# 2-1. Set up input files for all simulations with 1 rank
if rank == 0:
for i in range(EEXE.n_sim):
Expand All @@ -99,46 +100,29 @@ def main():
MDP.write(f"sim_{i}/iteration_0/{EEXE.mdp.split('/')[-1]}", skipempty=True)

# 2-2. Run the first ensemble of simulations
md = EEXE.run_EEXE(0)
EEXE.run_EEXE(0)

# 2-3. Restructure the directory (move the files from mdrun_0_i0_* to sim_*/iteration_0)
if rank == 0:
work_dir = md.output.directory.result()
for i in range(EEXE.n_sim):
if EEXE.verbose is True:
print(f' Moving files from {work_dir[i].split("/")[-1]}/ to sim_{i}/iteration_0/ ...')
print(f' Removing the empty folder {work_dir[i].split("/")[-1]} ...')
for f in glob.glob(f'{work_dir[i]}/*'):
shutil.move(f, f'sim_{i}/iteration_0/')
os.rmdir(work_dir[i])
start_idx = 1
else:
if rank == 0:
# If there is a checkpoint file, we see the execution as an extension of an EEXE simulation
ckpt_data = np.load(args.ckpt)
start_idx = len(ckpt_data[0])
start_idx = len(ckpt_data[0]) # The length should be the same for the same axis
print(f'\nGetting prepared to extend the EEXE simulation from iteration {start_idx} ...')

print('Deleting corrupted data ...')
corrupted = glob.glob('gmxapi.commandline.cli*') # corrupted iteration
corrupted.extend(glob.glob('mdrun*'))
for i in corrupted:
shutil.rmtree(i)
if len(corrupted) == 0:
corrupt_bool = False

for i in range(EEXE.n_sim):
n_finished = len(next(os.walk(f'sim_{i}'))[1]) # number of finished iterations (the last might be initialized but corrupted though) # noqa: E501
if n_finished == EEXE.n_iter and corrupt_bool is False:
print('Extension aborted: The expected number of iterations have been completed!')
sys.exit()
else:
print('Deleting data generated after the checkpoint ...')
if start_idx == EEXE.n_iter:
print('Extension aborted: The expected number of iterations have been completed!')
sys.exit()
else:
print('Deleting data generated after the checkpoint ...')
for i in range(EEXE.n_sim):
n_finished = len(next(os.walk(f'sim_{i}'))[1]) # number of finished iterations
for j in range(start_idx, n_finished):
print(f' Deleting the folder sim_{i}/iteration_{j}')
shutil.rmtree(f'sim_{i}/iteration_{j}')

# Read g_vecs.npy and rep_trajs.npy so that new data can be appended, if any.
# Note that these two arrays are created in rank 0 and should always be operated in rank 0,
# or broadcasting is required.
EEXE.rep_trajs = [list(i) for i in ckpt_data]
if os.path.isfile(args.g_vecs) is True:
EEXE.g_vecs = [list(i) for i in np.load(args.g_vecs)]
Expand Down Expand Up @@ -209,7 +193,9 @@ def main():
MDP.write(f"sim_{j}/iteration_{i}/{EEXE.mdp.split('/')[-1]}", skipempty=True)
# In run_EEXE(i, swap_pattern), where the tpr files will be generated, we use the top file at the
# level of the simulation (the file that will be shared by all simulations). For the gro file, we pass
# swap_patter to the function to figure it out internally.
# swap_pattern to the function to figure it out internally.
else:
swap_pattern = None

if -1 not in EEXE.equil and 0 not in EEXE.equil:
# This is the case where the weights are equilibrated in a weight-updating simulation.
Expand All @@ -220,20 +206,11 @@ def main():

# Step 4: Perform another iteration
# 4-1. Run another ensemble of simulations
md = EEXE.run_EEXE(i, swap_pattern)
swap_pattern = comm.bcast(swap_pattern, root=0)
EEXE.run_EEXE(i, swap_pattern)

if rank == 0:
# 4-2. Restructure the directory (move the files from mdrun_{i}_i0_* to sim_*/iteration_{i})
work_dir = md.output.directory.result()
for j in range(EEXE.n_sim):
if EEXE.verbose is True:
print(f' Moving files from {work_dir[j].split("/")[-1]}/ to sim_{j}/iteration_{i}/ ...')
print(f' Removing the empty folder {work_dir[j].split("/")[-1]} ...')
for f in glob.glob(f'{work_dir[j]}/*'):
shutil.move(f, f'sim_{j}/iteration_{i}/')
os.rmdir(work_dir[j])

# 4-3. Save data
# 4-2. Save data
if (i + 1) % EEXE.n_ckpt == 0:
if len(EEXE.g_vecs) != 0:
# Save g_vec as a function of time if weight combination was used.
Expand Down
Loading

0 comments on commit 239776a

Please sign in to comment.