-
Using Ubuntu 20.04 on HP Z-600 used git clone etc to clone the repository and ran ./autogen.sh. Received the following errors: mpiuser@BD-Main:~/mpich$ ./autogen.sh I have installed python 3.6 and receive same error. I have hunted for hwloc and installed sudo apt-get install hwloc with success however this does not correct the problem. In trying to install from the tarball mpich-4.0.1.tar.gz I have success with config, make and make install, however running with different nodes hangs with: mpiuser@BD-Main:~$ mpiexec -f machinefile -np 3 ./examples/cpi I have hunted and cannot find a way to install the missing library to make the tarball install work. It seems there is a problem with the hwloc libraries and if anyone has a solution to this, it would be appreciated. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 15 comments 103 replies
-
Run |
Beta Was this translation helpful? Give feedback.
-
Thanks
…On Sat., Mar. 26, 2022, 10:28 p.m. Hui Zhou, ***@***.***> wrote:
Run git submodule update --init after fresh clone to get the submodules.
—
Reply to this email directly, view it on GitHub
<#5910 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHRCUH3DV77YQX7ZERW72V3VB7PXZANCNFSM5RYD63VA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I did a fresh clone then entered:
git submodule update --init
and received following error:
***@***.***:~$ git submodule update --init
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Do I have to enter the http site in this command? Silly question. I'll try
it.
Yours,
Bruce
…On Sat, Mar 26, 2022 at 10:45 PM Bruce Rout ***@***.***> wrote:
Thanks
On Sat., Mar. 26, 2022, 10:28 p.m. Hui Zhou, ***@***.***>
wrote:
> Run git submodule update --init after fresh clone to get the submodules.
>
> —
> Reply to this email directly, view it on GitHub
> <#5910 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AHRCUH3DV77YQX7ZERW72V3VB7PXZANCNFSM5RYD63VA>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
--
GreenNABR.com
|
Beta Was this translation helpful? Give feedback.
-
Sorry, did not work. This is the output:
***@***.***:~$ git submodule update --init
https://github.com/pmodels/mpich.git
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
***@***.***:~$
Thanks for your help and time. Maybe I'm not entering the command
correctly. Here is the entire output from new git clone command:
***@***.***:~$* git clone https://github.com/pmodels/mpich.git
Cloning into 'mpich'...
remote: Enumerating objects: 211393, done.
remote: Counting objects: 100% (180/180), done.
remote: Compressing objects: 100% (131/131), done.
remote: Total 211393 (delta 72), reused 91 (delta 47), pack-reused 211213
Receiving objects: 100% (211393/211393), 75.14 MiB | 4.17 MiB/s, done.
Resolving deltas: 100% (171431/171431), done.
***@***.***:~$* git submodule update --init
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
***@***.***:~$* git clone https://github.com/pmodels/mpich.git^C
***@***.***:~$* git submodule update --init
https://github.com/pmodels/mpich.git
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
***@***.***:~$ *
I made the prompt in bold for easier reading. Thank you for your time and
help.
Yours,
Bruce
…On Sun, Mar 27, 2022 at 12:21 PM Bruce Rout ***@***.***> wrote:
I did a fresh clone then entered:
git submodule update --init
and received following error:
***@***.***:~$ git submodule update --init
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Do I have to enter the http site in this command? Silly question. I'll try
it.
Yours,
Bruce
On Sat, Mar 26, 2022 at 10:45 PM Bruce Rout ***@***.***> wrote:
> Thanks
>
> On Sat., Mar. 26, 2022, 10:28 p.m. Hui Zhou, ***@***.***>
> wrote:
>
>> Run git submodule update --init after fresh clone to get the submodules.
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#5910 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AHRCUH3DV77YQX7ZERW72V3VB7PXZANCNFSM5RYD63VA>
>> .
>> You are receiving this because you authored the thread.Message ID:
>> ***@***.***>
>>
>
--
GreenNABR.com
--
GreenNABR.com
|
Beta Was this translation helpful? Give feedback.
-
AHA! You first have to go into the cloned directory.
cd mpich
then run the git command to get modules. So sorry for all the emails. I got
this to get the modules. OK, I'll try the autogen.sh again. Thank you for
your patience.
Yours,
Bruce
…On Sun, Mar 27, 2022 at 12:26 PM Bruce Rout ***@***.***> wrote:
Sorry, did not work. This is the output:
***@***.***:~$ git submodule update --init
https://github.com/pmodels/mpich.git
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
***@***.***:~$
Thanks for your help and time. Maybe I'm not entering the command
correctly. Here is the entire output from new git clone command:
***@***.***:~$* git clone https://github.com/pmodels/mpich.git
Cloning into 'mpich'...
remote: Enumerating objects: 211393, done.
remote: Counting objects: 100% (180/180), done.
remote: Compressing objects: 100% (131/131), done.
remote: Total 211393 (delta 72), reused 91 (delta 47), pack-reused 211213
Receiving objects: 100% (211393/211393), 75.14 MiB | 4.17 MiB/s, done.
Resolving deltas: 100% (171431/171431), done.
***@***.***:~$* git submodule update --init
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
***@***.***:~$* git clone https://github.com/pmodels/mpich.git^C
***@***.***:~$* git submodule update --init
https://github.com/pmodels/mpich.git
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
***@***.***:~$ *
I made the prompt in bold for easier reading. Thank you for your time and
help.
Yours,
Bruce
On Sun, Mar 27, 2022 at 12:21 PM Bruce Rout ***@***.***> wrote:
> I did a fresh clone then entered:
>
> git submodule update --init
>
> and received following error:
>
> ***@***.***:~$ git submodule update --init
> fatal: not a git repository (or any parent up to mount point /home)
> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
>
> Do I have to enter the http site in this command? Silly question. I'll
> try it.
>
> Yours,
>
> Bruce
>
> On Sat, Mar 26, 2022 at 10:45 PM Bruce Rout ***@***.***> wrote:
>
>> Thanks
>>
>> On Sat., Mar. 26, 2022, 10:28 p.m. Hui Zhou, ***@***.***>
>> wrote:
>>
>>> Run git submodule update --init after fresh clone to get the
>>> submodules.
>>>
>>> —
>>> Reply to this email directly, view it on GitHub
>>> <#5910 (comment)>,
>>> or unsubscribe
>>> <https://github.com/notifications/unsubscribe-auth/AHRCUH3DV77YQX7ZERW72V3VB7PXZANCNFSM5RYD63VA>
>>> .
>>> You are receiving this because you authored the thread.Message ID:
>>> ***@***.***>
>>>
>>
>
> --
>
>
> GreenNABR.com
>
--
GreenNABR.com
--
GreenNABR.com
|
Beta Was this translation helpful? Give feedback.
-
Sorry,
I did get the repositories and submodules and ran autogen.sh successfully.
However, I get the same error that I get when I compile from thetarball
with including hwloc in the compile which you suggested in the other
thread.
This is the output:
***@***.***:~$ mpiexec -f machinefile -n 3 ./examples/cpi
/home/mpiuser/mpich-install/bin/hydra_pmi_proxy: error while loading shared
libraries: libhwloc.so.15: cannot open shared object file: No such file or
directory
***@***.*** Sending Ctrl-C to processes as requested
***@***.*** Press Ctrl-C again to force abort
This seems to be a problem for a number of people. I have a rar file with
the missing library but I doin't know how to install it. I have attached
the rar file for you to examine if this would help.
Thank you for your time,
Bruce
…On Sun, Mar 27, 2022 at 12:33 PM Bruce Rout ***@***.***> wrote:
AHA! You first have to go into the cloned directory.
cd mpich
then run the git command to get modules. So sorry for all the emails. I
got this to get the modules. OK, I'll try the autogen.sh again. Thank you
for your patience.
Yours,
Bruce
On Sun, Mar 27, 2022 at 12:26 PM Bruce Rout ***@***.***> wrote:
> Sorry, did not work. This is the output:
>
> ***@***.***:~$ git submodule update --init
> https://github.com/pmodels/mpich.git
> fatal: not a git repository (or any parent up to mount point /home)
> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
> ***@***.***:~$
>
> Thanks for your help and time. Maybe I'm not entering the command
> correctly. Here is the entire output from new git clone command:
>
> ***@***.***:~$* git clone https://github.com/pmodels/mpich.git
> Cloning into 'mpich'...
> remote: Enumerating objects: 211393, done.
> remote: Counting objects: 100% (180/180), done.
> remote: Compressing objects: 100% (131/131), done.
> remote: Total 211393 (delta 72), reused 91 (delta 47), pack-reused 211213
> Receiving objects: 100% (211393/211393), 75.14 MiB | 4.17 MiB/s, done.
> Resolving deltas: 100% (171431/171431), done.
> ***@***.***:~$* git submodule update --init
> fatal: not a git repository (or any parent up to mount point /home)
> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
> ***@***.***:~$* git clone https://github.com/pmodels/mpich.git^C
> ***@***.***:~$* git submodule update --init
> https://github.com/pmodels/mpich.git
> fatal: not a git repository (or any parent up to mount point /home)
> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
> ***@***.***:~$ *
>
> I made the prompt in bold for easier reading. Thank you for your time and
> help.
>
> Yours,
>
> Bruce
>
>
> On Sun, Mar 27, 2022 at 12:21 PM Bruce Rout ***@***.***> wrote:
>
>> I did a fresh clone then entered:
>>
>> git submodule update --init
>>
>> and received following error:
>>
>> ***@***.***:~$ git submodule update --init
>> fatal: not a git repository (or any parent up to mount point /home)
>> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not
>> set).
>>
>> Do I have to enter the http site in this command? Silly question. I'll
>> try it.
>>
>> Yours,
>>
>> Bruce
>>
>> On Sat, Mar 26, 2022 at 10:45 PM Bruce Rout ***@***.***> wrote:
>>
>>> Thanks
>>>
>>> On Sat., Mar. 26, 2022, 10:28 p.m. Hui Zhou, ***@***.***>
>>> wrote:
>>>
>>>> Run git submodule update --init after fresh clone to get the
>>>> submodules.
>>>>
>>>> —
>>>> Reply to this email directly, view it on GitHub
>>>> <#5910 (comment)>,
>>>> or unsubscribe
>>>> <https://github.com/notifications/unsubscribe-auth/AHRCUH3DV77YQX7ZERW72V3VB7PXZANCNFSM5RYD63VA>
>>>> .
>>>> You are receiving this because you authored the thread.Message ID:
>>>> ***@***.***>
>>>>
>>>
>>
>> --
>>
>>
>> GreenNABR.com
>>
>
>
> --
>
>
> GreenNABR.com
>
--
GreenNABR.com
--
GreenNABR.com
|
Beta Was this translation helpful? Give feedback.
-
Yes, I did. It compiled fine and installed in both the tarball version and
the git clone version. However there were execution errors.
***@***.***:~$* which mpicc
/home/mpiuser/mpich-install/bin/mpicc
***@***.***:~$* which mpiexec
/home/mpiuser/mpich-install/bin/mpiexec
***@***.***:~$* mpiexec -n 3 ./examples/cpi
Process 0 of 3 is on BD-Main
Process 1 of 3 is on BD-Main
Process 2 of 3 is on BD-Main
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.002373
***@***.***:~$* mpiexec -f machinefile -n 3 ./examples/cpi
./examples/cpi: error while loading shared libraries: libefa.so.1: cannot
open shared object file: No such file or directory
./examples/cpi: error while loading shared libraries: libefa.so.1: cannot
open shared object file: No such file or directory
./examples/cpi: error while loading shared libraries: libefa.so.1: cannot
open shared object file: No such file or directory
***@***.***:~$*
This is the machinefile contents:
***@***.***:~$* more machinefile
node1
node2
node3
***@***.***:~$*
Thank you for your help.
…On Sun, Mar 27, 2022 at 3:18 PM Hui Zhou ***@***.***> wrote:
What is your configure line? Did you add the --with-hwloc=embedded option?
—
Reply to this email directly, view it on GitHub
<#5910 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHRCUH3UV7BNGKNBOSZU2OTVCDGCRANCNFSM5RYD63VA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
GreenNABR.com
|
Beta Was this translation helpful? Give feedback.
-
Sorry. lld of ./examples/cpi?? the directory ./examples/cpi is installed on
the master node which is BD-Main. The nodes are named BD-1, BD-2 and BD-3.
They are set at node1, node2 and nod3 in /etc/hosts on the master node, ie,
BD-Main and that is the node running the mpiexec command.
This is the /etc/hosts file
127.0.0.1 localhost
127.0.1.1 BD-Main
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# The following sets up the local network for cluster
10.0.0.1 master
10.0.0.2 node1
10.0.0.3 node2
10.0.0.4 node3
I will try to hunt down and find libefa and install it on the slave nodes.
I'll get back to you.
Yours,
Bruce
…On Mon, Mar 28, 2022 at 7:14 PM Hui Zhou ***@***.***> wrote:
What is your ldd ./examples/cpi?
You have libefa installed on the login node BD-Main. The same library is
not available on node1 to node3.
—
Reply to this email directly, view it on GitHub
<#5910 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHRCUH6XHDVA462I36CV5CLVCJKPDANCNFSM5RYD63VA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
GreenNABR.com
|
Beta Was this translation helpful? Give feedback.
-
I don't know what the lld of ./examples.cpi means. Nevertheless, I have
found the library at this site:
https://github.com/linux-rdma/rdma-core
Should I download and install or just use this:
$ apt-get install build-essential cmake gcc libudev-dev libnl-3-dev
libnl-route-3-dev ninja-build pkg-config valgrind python3-dev cython3
python3-docutils pandoc
Thanks again for your help. The nodes were installed with Ubuntu 20.0.4
with updates and upgrade. I don't know why the nodes do not have the same
libraries as the master.
Yours,
Bruce
…On Mon, Mar 28, 2022 at 8:32 PM Bruce Rout ***@***.***> wrote:
Sorry. lld of ./examples/cpi?? the directory ./examples/cpi is installed
on the master node which is BD-Main. The nodes are named BD-1, BD-2 and
BD-3. They are set at node1, node2 and nod3 in /etc/hosts on the master
node, ie, BD-Main and that is the node running the mpiexec command.
This is the /etc/hosts file
127.0.0.1 localhost
127.0.1.1 BD-Main
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
# The following sets up the local network for cluster
10.0.0.1 master
10.0.0.2 node1
10.0.0.3 node2
10.0.0.4 node3
I will try to hunt down and find libefa and install it on the slave nodes.
I'll get back to you.
Yours,
Bruce
On Mon, Mar 28, 2022 at 7:14 PM Hui Zhou ***@***.***> wrote:
> What is your ldd ./examples/cpi?
>
> You have libefa installed on the login node BD-Main. The same library is
> not available on node1 to node3.
>
> —
> Reply to this email directly, view it on GitHub
> <#5910 (reply in thread)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AHRCUH6XHDVA462I36CV5CLVCJKPDANCNFSM5RYD63VA>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
--
GreenNABR.com
--
GreenNABR.com
|
Beta Was this translation helpful? Give feedback.
-
slightly different error. had to remove path change in .bashrc and log out and back in to get correct mpiexec in /user. Howewver, slightly different error: mpiuser@BD-Main:~$ mpiexec -f machinefile -n 3 ./examples/cpi |
Beta Was this translation helpful? Give feedback.
-
mpiuser@BD-Main:~$ mpiexec -f machinefile -n 3 ./cpi |
Beta Was this translation helpful? Give feedback.
-
mpiuser@BD-Main:~$ /usr/bin/mpiexec -f machinefile -n 3 ./cpi |
Beta Was this translation helpful? Give feedback.
-
Now get the correct help file. |
Beta Was this translation helpful? Give feedback.
-
Sorry for all the posts and replies. I have to work now and will try later. The environment variable and how it is handled may be corrupted somehow. There was a post somewhere on the internet about how to completely eradicate all instances of mpi, openmpi and mpich. I have to hunt that down again. It was an operation by hand and I can repeat that in /user/local/bin. There were some hidden files in there that are probably being called upon somehow. Thanks for your help and patience. |
Beta Was this translation helpful? Give feedback.
Run
git submodule update --init
after fresh clone to get the submodules.