on the AVIDD Clusters is a little problematic because
there are several MPI versions and implementations there.
What is of special interest to us is MPI-2, because MPI-IO is
integral to it. For this reason we have installed a beta version
of Argonne's MPICH2 in
This directory should be mounted on all computational nodes and head nodes of
both IUPUI and IUB clusters.
You will also need to define an environmental variable
LD_LIBRARY_PATH(because there are dynamic run-time libraries in
MPD_USE_USER_CONSOLEdefined in your environment the MPICH2 execution engine is going to do some rather weird things with the name of the socket file with the sad effect that correct communication within the engine will not get established and the engine will shut down.
But first and foremost you must create a file
.mpd.confin your home directory. This file must be readable by you only and you must be the only person allowed to write on it too:$ cd $ touch .mpd.conf $ chmod 600 .mpd.confNow you must enter the following line on this file:password=yoopeeReplace the word ``yoopee'' with your favourite password, of course. Do not use your AVIDD or your IU Net password. This password is for the MPICH2 MPD system only.
I forgot to tell you about this in the laboratory class and that is why MPICH2 worked for me only. Then I forgot about it altogether, and ended up very perplexed and suspected MPICH2 of the most horrible bugs imaginable. MPICH2 is a beta release at this stage, to be sure, so surprises are possible, but it's not this bad.
The easiest way to get your
MPD_USE_USER_CONSOLE right is
.inputrc files from my home directory to your home.
Proceed as follows. After you have logged on, issue the commands:
$ cd $ cp .bashrc .bashrc.ORIG $ cp .bash_profile .bash_profile.ORIG $ cp .inputrc .inputrc.ORIG $ cp ~gustav/.bashrc .bashrc $ cp ~gustav/.bash_profile .bash_profile $ cp ~gustav/.inputrc .inputrc $ chmod 755 .bashrc .bash_profile $ chmod 644 .inputrcHaving done this logout and login again.
If you know what you are doing and if you prefer to use shells other
bash, have a look at these files and then set up your environment
similarly. The most important thing is to
$HOME/binis in front of the command search path, so that you can overwrite system commands with your own.
/N/hpc/mpich2/binis the second directory in your command search path, so that you'll get MPI-2 start up commands, as well as Python-2.3 and other tools used by MPI-2, in place of whatever may be currently installed on the system - the system-wide version of Python, for example, is older and doesn't work with MPICH2.
To check that everything is as it ought to be try the following commands:
gustav@bh1 $ cd gustav@bh1 $ ls -l .mpd.conf -rw------- 1 gustav ucs 16 Oct 2 18:58 .mpd.conf gustav@bh1 $ cat .mpd.conf password=frabjous gustav@bh1 $ env | grep PATH LD_RUN_PATH=/N/B/gustav/lib:/N/hpc/mpich2/lib LD_LIBRARY_PATH=/N/B/gustav/lib:/N/hpc/mpich2/lib MANPATH=/N/B/gustav/man:/N/B/gustav/share/man:/N/hpc/mpich2/man:\ N/hpc/mpich2/share/man:/usr/local/man:/usr/local/share/man:\ usr/man:/usr/share/man:/usr/X11R6/man PATH=/N/B/gustav/bin:/N/hpc/mpich2/bin:/usr/local/bin:/bin:\ usr/bin:/usr/X11R6/bin:/usr/pbs/bin:/usr/local/hpss:. gustav@bh1 $ env | grep MPD MPD_USE_USER_CONSOLE=yes gustav@bh1 $The first directory in
PATH, of course, should be replaced with your private
Now you should run the following command on the IUB cluster
gustav@bh1 $ for i in `cat ~gustav/.bcnodes` > do > echo -n "$i: " > ssh $i date > done bc01-myri0: Wed Oct 1 16:12:56 EST 2003 bc02-myri0: Wed Oct 1 16:12:56 EST 2003 bc03-myri0: Wed Oct 1 16:12:56 EST 2003 bc04-myri0: Wed Oct 1 16:12:57 EST 2003 ... bc93-myri0: Wed Oct 1 16:13:39 EST 2003 bc94-myri0: Wed Oct 1 16:13:39 EST 2003 bc95-myri0: Wed Oct 1 16:13:39 EST 2003 bc96-myri0: Wed Oct 1 16:13:40 EST 2003 gustav@bh1 $and similarly on the IUPUI cluster:
gustav@ih1 $ for i in `cat ~gustav/.icnodes` > do > echo -n "$i: " > ssh $i date > done ic01-myri0: Wed Oct 1 16:15:14 EST 2003 ic02-myri0: Wed Oct 1 16:15:14 EST 2003 ic03-myri0: Wed Oct 1 16:15:15 EST 2003 ic04-myri0: Wed Oct 1 16:15:15 EST 2003 ... ic93-myri0: Wed Oct 1 16:18:24 EST 2003 ic94-myri0: Wed Oct 1 16:18:24 EST 2003 ic95-myri0: Wed Oct 1 16:18:24 EST 2003 ic97-myri0: Wed Oct 1 16:18:25 EST 2003Please let me know if these commands hang on any of the nodes. The files
~gustav/.icnodescontain the lists of currently functional computational nodes with working Myrinet interfaces both at IUPUI and IUB. These lists may change every now and then, in which case we may have to repeat this procedure.
The purpose of this procedure is to populate your
~/.ssh/known_hosts file with the computational nodes' keys. The
ssh command inserts the key automatically in your
known_hosts if it is not there. But in the process it writes
a message on standard output that may confuse the MPICH2
Now you are almost ready to run your first MPI job. First copy two more files from my home directory. Do it as follows:
$ cd $ [ -d bin ] || mkdir bin $ [ -d PBS ] || mkdir PBS $ cp ~gustav/bin/hellow2 bin $ chmod 755 bin/hellow2 $ cp ~gustav/PBS/mpi.sh PBS $ chmod 755 PBS/mpi.shNow submit the job to PBS as follows:
$ cd ~/PBS $ qsub mpi.sh $ qstat | grep `whoami` 21303.bh1 mpi gustav 0 R bg $ !! 21303.bh1 mpi gustav 0 R bg $After you have submitted the job, you can monitor its progress through the PBS system with
$ qstat | grep `whoami`every now and then. But the job should run quickly, unless the system is very busy. If everything works as it ought to, you will find
mpi_outin your working directory after the job completes. The first file should be empty and the second will contain the output of the job, which should be similar to:
$ cat mpi_err $ cat mpi_out Local MPD console on bc68 bc68_33575 bc47_34123 bc46_34056 bc49_34697 bc48_33551 bc53_34095 bc54_35385 bc55_34714 time for 100 loops = 0.124682068825 seconds 0: bc68 2: bc46 1: bc47 3: bc49 4: bc48 6: bc54 5: bc53 7: bc55 bc68: hello world from process 0 of 8 bc47: hello world from process 1 of 8 bc46: hello world from process 2 of 8 bc49: hello world from process 3 of 8 bc48: hello world from process 4 of 8 bc53: hello world from process 5 of 8 bc54: hello world from process 6 of 8 bc55: hello world from process 7 of 8 $This should work on both AVIDD clusters.
Let us have a look at the PBS script:
gustav@bh1 $ cat mpi.sh #PBS -S /bin/bash #PBS -N mpi #PBS -o mpi_out #PBS -e mpi_err #PBS -q bg #PBS -m a #PBS -V #PBS -l nodes=8 NODES=8 HOST=`hostname` echo Local MPD console on $HOST # Specify Myrinet interfaces on the hostfile. grep -v $HOST $PBS_NODEFILE | sed 's/$/-myri0/' > $HOME/mpd.hosts # Boot the MPI2 engine. mpdboot --totalnum=$NODES --file=$HOME/mpd.hosts sleep 10 # Inspect if all MPI nodes have been activated. mpdtrace -l # Check the connectivity. mpdringtest 100 # Check if you can run trivial non-MPI jobs. mpdrun -l -n $NODES hostname # Execute your MPI program. mpiexec -n $NODES hellow2 # Shut down the MPI2 engine and exit the PBS script. mpdallexit exit 0 gustav@bh1 $There is a new PBS directive here, which we haven't encountered yet. The option
-llets you specify the list of resources required for the job. In this case there is only one item in the list,
nodes=8, and this item states that you need eight nodes from the PBS in order to run the job. PBS is going to return the names of the nodes on the file, whose name is conveyed in the environmental variable
PBS_NODEFILE. The nodes are listed on this file one per line.
Also observe that we have used the
-V directive. By doing
this we have imported
all environmental variables, including the
which is essential for
The first thing we do in the script is to convert the node names
to their Myrinet equivalents. This is easy to do, because
the Myrinet names are obtained by appending
the name of the node, returned on
$PBS_NODEFILE. This is
what the first command in the script does:
grep -v $HOST $PBS_NODEFILE | sed 's/$/-myri0/' > $HOME/mpd.hostsThere is one complication here though. We are removing from this list the name of the node on which the script runs with the command
grep -v $HOST. This is because MPICH2 is going to create a process on this host anyway. If we left this host's name on the file, MPICH2 would create two processes on it.
Once the names returned on
$PBS_NODEFILE have been converted
to the Myrinet names, we save them on
Now we are ready to start the MPICH2
engine. The command that does
mpdboot --totalnum=$NODES --file=$HOME/mpd.hostsProgram
mpdbootis a Python-2.3 script that boots the MPICH2 engine by spawning MPICH2 supervisory processes, called
mpds (pronounced ``em-pea-dee-s''), on nodes specified on
We have to give
mpdboot a few seconds to complete its job.
We do this by telling the script to
sleep for 10 seconds.
mpds are spawned by
ssh, which is why we had to get
all the keys in place in the first place. Silly problems may show up
if the keys are not there.
mpdtrace -linspects the MPICH2 engine and lists names of all nodes on which mpds are running. With the
-loption, it also lists the names of the sockets used by the
mpds to communicate with each other.
The next command
mpdringtest 100times a simple message going around the ring of mpds, in this case 100 times.
These two commands,
us that the MPICH2 engine is ready. We can now execute programs on
it. These don't have to be MPICH2 programs though. You can
execute any UNIX program under the MPICH2 engine. But if they are
not MPICH2 programs, they will not communicate with each other.
You will just get a number of independent instantiations of those
programs running on individual nodes. The script demonstrates this
by running the UNIX command
hostname under the MPICH2 engine:
mpdrun -l -n $NODES hostnameThe option
mpdrunto attach process labels to any output that the instantiations of
hostnameon MPICH2 nodes may produce.
At long last we commence the execution of a real MPICH2 program.
The program's name is
hellow2. It is an MPI version of
``Hello World''. This program should be picked up from your
$HOME/bin assuming that it is present in your command
PATH. The command to run
this program under the MPICH2 engine is
mpiexec -n $NODES hellowObserve that we don't have to use all nodes given to us by the PBS, but, of course, it would be silly not to, unless there is a special reason for it. Program
mpiexecis not specific to MPICH2. MPI-2 specification says that such a program must be provided for the execution of MPI jobs. There was no such specification in the original MPI.
hellow2 exits, we are done and we shut down the MPICH2