Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make scanners MPI issue #1

Open
aseshkdatta opened this issue Dec 26, 2022 · 6 comments
Open

make scanners MPI issue #1

aseshkdatta opened this issue Dec 26, 2022 · 6 comments

Comments

@aseshkdatta
Copy link

Dear Colleague,

cmake -DCMAKE_CXX_COMPILER=/opt/gcc-8.2/bin/g++ -DCMAKE_C_COMPILER=/opt/gcc-8.2/bin/gcc -DWITH_MPI=ON ..

make -j8 scanners

results in the following repeating error messages regarding MPI.

Could you please shed light on the issue.

Thanks and regards.

=====================================================
[ 41%] Building CXX object src/CMakeFiles/Minuit2.dir/MnSeedGenerator.cxx.o
In file included from /opt/ICS_2013/impi/4.1.3.048/intel64/include/mpi.h:1279,
from /ws2scratch/asesh/Gambit2/gambit_2.3/ScannerBit/installed/minuit2/6.23.01/inc/Minuit2/MPIProcess.h:20,
from /ws2scratch/asesh/Gambit2/gambit_2.3/ScannerBit/installed/minuit2/6.23.01/src/MPIProcess.cxx:11:
/opt/ICS_2013/impi/4.1.3.048/intel64/include/mpicxx.h:95:2: error: #error "SEEK_SET is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h"
#error "SEEK_SET is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h"
^~~~~
/opt/ICS_2013/impi/4.1.3.048/intel64/include/mpicxx.h:99:2: error: #error "SEEK_CUR is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h"
#error "SEEK_CUR is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h"
^~~~~
/opt/ICS_2013/impi/4.1.3.048/intel64/include/mpicxx.h:104:2: error: #error "SEEK_END is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h"
#error "SEEK_END is #defined but must not be for the C++ binding of MPI. Include mpi.h before stdio.h"
^~~~~
[ 33%] No install step for 'multinest_3.12'
[ 42%] Building CXX object src/CMakeFiles/Minuit2.dir/MnStrategy.cxx.o
[ 43%] Building CXX object src/CMakeFiles/Minuit2.dir/MnTiny.cxx.o

@aseshkdatta
Copy link
Author

Dear Colleague,

It seems that I have managed to bypass the issue by having openmpi installed in my working area.
I'm now looking closely into the functioning of mpi.

Regards.

@tegonzalo
Copy link
Contributor

Hi! That error seems to point to an issue witht he MPI installation, as it is failing on an MPI header file. Please let us know if it works with your local OpenMPI. Thanks!

@aseshkdatta
Copy link
Author

Thanks, Thomas.

Here is nature of the problem with my local openmpi (v4.1.4) installation.
Could you please observe.

time mpirun -np 24 ./gambit -rf yaml_files/spartan.yaml
Diver run hangs with the following prompted on the terminal:

Point Suspicious (66): This is a demo for using suspicious points.
Point Suspicious (66): This is a demo for using suspicious points.
Diver run finished!
ScannerBit is waiting for all MPI processes to report their shutdown condition...

GAMBIT has finished successfully!

Calling MPI_Finalize...


^c on the gambit terminal prompts

^CCtrl-C caught... cleaning up processes
^CCtrl-C caught... cleaning up processes


while the 'top' command shows 'pmi_proxy' running continuously.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
411451 asesh 20 0 16636 1400 1116 R 99.7 0.0 17:04.65 pmi_proxy


Killing the process prompts on the gambit terminal the following:

[asesh@ws2 gambit_2.3]$kill -9 411451
Calling MPI_Finalize...
[mpiexec@ws2] control_cb (./pm/pmiserv/pmiserv_cb.c:717): assert (!closed) failed
[mpiexec@ws2] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@ws2] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@ws2] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

real 20m27.143s
user 0m0.074s
sys 0m0.330s
[asesh@ws2 gambit_2.3]$

@tegonzalo
Copy link
Contributor

Ok, so GAMBIT finishes correctly, but it hangs at the end when releasing the MPI handles.

Fortunately this just seems to be a problem with the cleanup stage, so the scan ran perfectly and the results are untouched. So if you are using GAMBIT to run scans, then you should not worry that your results may be affected.

But we will look into the hanging issue. Thanks for report this

@aseshkdatta
Copy link
Author

Thanks a lot, Tomas, for your observation.

I am facing another issue while installing gambit on a different machine.
That is appearing when 'cmake' is run and is as follows.

The loaded modules are

  1. gcc/8.2.0 3) python/3.9 5) openmpi/4.1.4
  2. gsl/2.7 4) boost/boost1.76 6) cmake/3.25.1

[asesh@ws1 build]$cmake -DCMAKE_CXX_COMPILER=/ws1scratch/apps/gcc/gcc-8.2/bin/g++ -DCMAKE_C_COMPILER=/ws1scratch/apps/gcc/gcc-8.2/bin/gcc -DWITH_MPI=ON -DMPI_C_COMPILER=/ws1scratch/apps/mpi4.14/bin/mpicc -DMPI_CXX_COMPILER=/ws1scratch/apps/mpi4.14/bin/mpicxx -DMPI_Fortran_COMPILER=/ws1scratch/apps/mpi4.14/bin/mpif90 ..

X Switching OFF FlexibleSUSY support for ALL models.
If you want to activate support for any model(s) please list them in the cmake flag -DBUILD_FS_MODELS= as a semi-colon separated list.
Buildable models are: MDM;CMSSM;MSSM;MSSMatMGUT;MSSM_mAmu;MSSMatMSUSY_mAmu;MSSMatMGUT_mAmu;MSSMEFTHiggs;MSSMEFTHiggs_mAmu;MSSMatMSUSYEFTHiggs_mAmu;MSSMatMGUTEFTHiggs;MSSMatMGUTEFTHiggs_mAmu;ScalarSingletDM_Z3;ScalarSingletDM_Z2
To build ALL models use ALL, All, or all.
To build NO models use None or none.
-- Configuring FlexibleSUSY - done.
-- Updating GAMBIT scanner cmake and related files
Traceback (most recent call last):
File "/ws1scratch/asesh/Gambit/gambit_2.3/ScannerBit/scripts/scanner+_harvester.py", line 42, in
exec(compile(open(toolsfile, "rb").read(), toolsfile, 'exec')) # Python 2/3 compatible version of 'execfile'
File "./Utils/scripts/harvesting_tools.py", line 39, in
import ctypes
File "/ws1scratch/pkgs/python-3.9/lib/python3.9/ctypes/init.py", line 8, in
from _ctypes import Union, Structure, Array
ModuleNotFoundError: No module named '_ctypes'
CMake Error at cmake/utilities.cmake:72 (message):
Cmake failed because a GAMBIT python script failed. Culprit:
/ws1scratch/asesh/Gambit/gambit_2.3/ScannerBit/scripts/scanner+_harvester.py
Call Stack (most recent call first):
CMakeLists.txt:578 (check_result)

-- Configuring incomplete, errors occurred!
See also "/ws1scratch/asesh/Gambit/gambit_2.3/build/CMakeFiles/CMakeOutput.log".
See also "/ws1scratch/asesh/Gambit/gambit_2.3/build/CMakeFiles/CMakeError.log".
CMake Warning:
Value of Mathematica_KERNEL_HOST_SYSTEM_ID contained a newline; truncating

Noting that it is complaining about "ModuleNotFoundError: No module named '_ctypes' ",
we checked the internet where its is suggested that 'libffi-devel' needs to be installed
for systems running on CentOS (which is what we have on this machine). However, this
did not help and the same error message is being prompted.

Kindly let me know your observations.
Thanks in advance.

@tegonzalo
Copy link
Contributor

Hi!

That's probably because your python module does not know that you installed the libffi-devel library. Can you recompile the python environment?

Alternatively you can manually download the cpython package from https://github.com/python/cpython and install it in the system.

And as the very last resort, ctypes is only used in gambit to check the size of integers when creating type equivalency relations. So if you know exactly what system are you running with (either 32-bit or 64-bit), then you can substitute line 89 on Utils/scripts/harvesting_tools.py for the following line

if re.search("32", member):

if you have a 32-bit system, or

if re.search("64", member):

for 64-bit systems.

And then you can remove the import ctypes line at the top of the file. Hopefully that would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants