A Rust-based implementation of parallel Hooke-Jeeves optimization, exploring parallelization schemes such as multithreading, process-based parallelism with MPI and hybrid approaches for efficient global optimization.
Global optimization is a critical problem in applied mathematics with wide-ranging applications in fields such as economics, chemistry, biology and engineering. Unlike local optimization, which aims to find the minimum or maximum of an objective function in a localized region, global optimization seeks to identify the global extremum (minimum or maximum) of the objective function across the entire solution space, accounting for all local extrema.
A commonly employed strategy in global optimization is Multistart Local Search, where a local optimization algorithm is initialized from several randomly chosen starting points. This method enhances the likelihood of locating the global minimum by facilitating the exploration of multiple regions within the solution space. The success of this approach is contingent upon the ability of the local optimization algorithm to effectively traverse diverse areas of the search space. Parallelism is crucial to this process, as it allows for the concurrent execution of multiple local searches, thereby enhancing computational performance. By leveraging parallelism, the efficiency of the global optimization process is significantly improved, resulting in faster convergence and a higher likelihood of successfully identifying the global optimum.
The Hooke-Jeeves direct search method is a well-known derivative-free optimization algorithm. It operates based on a pattern search strategy, iteratively exploring the search space by moving along predefined directions from the current point. If a direction yields a better objective function value, the current point is updated accordingly; otherwise, the search parameters are adjusted and the process is repeated. Its derivative-free nature makes the method particularly well-suited for optimization problems where the objective function is non-differentiable, discontinuous or computationally expensive, ensuring effective exploration of the solution space even in the absence of gradient information.
In this project, various parallelization schemes are applied at the trial level to enhance the efficiency of the Hooke-Jeeves method in locating the global minimum of an objective function. By distributing the optimization trials across multiple workers, the search for the global optimum can be conducted simultaneously in different regions of the solution space. This parallelization accelerates the optimization process, reducing computation time and increasing the chances of identifying the global minimum.
The Rosenbrock function (also known as the banana function) is chosen as the objective function for this optimization task. It is mathematically defined as:
The Rosenbrock function is a common benchmark for optimization algorithms, having a global minimum at
Note
An implementation of the Hooke-Jeeves method in C is available at NetLib. This implementation computes a point
To get started with this project, ensure that the following prerequisites are met:
-
Rust (edition >= 2021) is installed on your system.
-
An MPI implementation is installed, such as:
Ensure that the mpirun
command is available in your system's PATH for running the MPI-based binaries.
- Clone the repository and navigate to the project's directory:
git clone https://github.com/Sofosss/Parallel-Hooke-Jeeves.git cd Parallel-Hooke-Jeeves
- The project includes 4 binary crates (each corresponding to a different scheme of the Hooke-Jeeves Method (multithreading, multiprocessing, hybrid and sequential)). Install them with:
cargo install --path .
Note
The binaries will be installed in the ~/.cargo/bin directory. Make sure this directory is included in your PATH.
- Alternatively, build and copy the binaries manually:
cargo build --release # Repeat the cp command for each binary crate as required cp ./target/release/<binary-name> /usr/local/bin
- Verify installation:
# Ensure the binaries are installed correctly by checking their versions <binary-name> --version
All binaries require two mandatory arguments:
- nvars: Number of variables in the optimization problem.
- ntrials: Number of trials to perform.
- Multithreading:
Requires an additional argument specifying the number of threads.threads_hj <nvars> <ntrials> <nthreads>
- Multiprocessing (MPI):
Requires execution via mpirun or a compatible MPI runner. Specify the number of processes using -n or -np.
mpirun -n <nprocesses> mpi_hj <nvars> <ntrials>
- Hybrid (MPI + threads):
Requires both the number of processes and threads.
mpirun -n <nprocesses> hybrid_hj <nvars> <ntrials> <nthreads>
Note
If you prefer not to manually build or install the binaries using cargo
, you can use the provided script. This script automates the process of building and running the required binary for any of the implemented schemes. To use the script, ensure it is executable by running the command chmod +x run.sh
. Once the script is ready, you can run it by specifying the binary name along with the required arguments.
In this project, three parallelization schemes for the Multistart Hooke-Jeeves Optimization Method are implemented, each designed to efficiently find the global minimum of the Rosenbrock objective function:
-
Multithreading: In the multithreading scheme, parallelization is implemented using the thread module from the Rust Standard Library. Multiple threads are spawned, each independently executing the Hooke-Jeeves method on distinct random starting points. Each thread identifies the local minimum within the trials it performs. The trials are distributed evenly across the threads, with the last thread assigned any remaining trials when the total number of trials is not perfectly divisible by the number of available threads. Once all threads have completed their assigned trials, the main thread gathers the results and determines the global minimum. This shared-memory approach leverages multi-core processors to accelerate the optimization process within a single computational node. The corresponding implementation is available here.
-
Multiprocessing (MPI): In the MPI-based parallelization scheme, the optimization trials are distributed across multiple processes, with each process assigned a subset of trials based on its rank. Each process executes its trials and computes a local minimum. The main process collects the local minima from all processes using an MPI gather operation and identifies the global minimum by comparing these values. For MPI communication, the MPI Rust bindings provided by the mpi crate are utilized. The code for the multiprocessing scheme can be found here.
-
Hybrid Model (MPI processes + threads): A combination of multithreading and MPI-based multiprocessing is employed in this scheme. Each MPI process spawns multiple threads to further parallelize the workload within its assigned subset of trials. The threads within each MPI process independently execute the Hooke-Jeeves method on their allocated trials and compute their respective thread-local minima. Once all threads within a process complete execution, the main thread of that process identifies the process-local minimum by comparing the thread-local minima. The process-local minima are then communicated to the main MPI process, which gathers them using an MPI gather operation and determines the global minimum by comparing these values. This design efficiently combines multithreading for intra-process parallelism with MPI-based multiprocessing for inter-process parallelism. The implementation for this hybrid approach is available here.
Note
The parallel implementations are based on the sequential version of the algorithm, which initiates
To evaluate and compare the performance of the different parallelization schemes implemented for the Hooke-Jeeves method, the benchmarking script was utilized. For each combination of workers' arguments, five runs were executed and the average result was taken. The tests were conducted using a fixed number of 32 variables (nvars
) and 65536 trials (ntrials
).
All executions were conducted on an Nvidia DGX system (128 logical CPU cores [64 physical cores - 2 threads per core], 1 TB of DDR4 RAM).
Figure 1: Comparison of speedup for multithreading and MPI-based multiprocessing schemes
The graph above shows a comparison of speedup between the multithreading and MPI-based multiprocessing schemes, demonstrating how performance scales with different configurations.
Figure 2: Performance comparison of the hybrid scheme with 1, 2 and 4 threads per MPI process
Figure 2 illustrates the performance of the hybrid scheme, comparing the speedup achieved with 1, 2 and 4 threads per MPI process.
- Sofotasios Argiris | a.sofotasios@ceid.upatras.gr
- Metaxakis Dimitris | d.metaxakis@ceid.upatras.gr
Distributed under the [MIT] License. See LICENSE.md
for more details.