[ Introduction |
Preliminary setup |
Compiling MPI programs
|Booting LAM/MPI
| Running MPI programs
| Shutting down LAM/MPI | Cleaning
up after yourself | An example ]
This document is a customized version of the "Getting started with LAM/MPI" tutorial for using LAM/MPI at Dalhousie University Faculty of Computer Science. This document describes the steps to preparing an MPI session using LAM/MPI 6.5.6. The document is organized into eight sections:
- Introduction
- Preliminary setup (only needs to be executed once),
- Compiling MPI programs,
- Booting LAM/MPI,
- Running under LAM/MPI,
- Shutting down LAM/MPI.,
- Cleaning up after yourself, and
- An Example
This document will not cover in details the ethical and proper computing etiquettes relating to the use of public clusters for parallel computing. As a result, users are requested to consider the affects of the improper use of LAM.
MPI is suitable for parallel machines such as the IBM SP, SGI Origin, etc., but it also works well in clusters of workstations. Taking advantage of the availability of the clusters of workstations at Dalhousie, we are interested in using MPI as a single parallel virtual machine with multiple nodes.
The MPI-1 Standard supports the portability and platform independent computing. As a result, users will undoubtedly enjoy the cross-platform development capability as well as heterogenous communication. For example, MPI codes which have been written on the RS-6000 architecture running AIX can be ported to a Sparc arhitecture running Solaris with almost no modification necessary.
LAM is a daemon based implementation of MPI. Initially, the program
lamboot
spawns LAM daemons based on the list of host machines provided
by the user. These daemons remains idle on the remote machines until they
receive a message to load the MPI binary to begin execution. Bottom line:
LAM MPI is a daemon based implementation of MPI. It uses a number of specialized commands which are not in most peoples' path. The FCS installation of LAM MPI is in /opt/MPI/lam/..., so you should add these lines to you .bashrc.
LAMHOME=/opt/MPI/lam if [ -d ${LAMHOME} ] then PATH=$PATH:${LAMHOME}/bin MANPATH=$MANPATH:${LAMHOME}/man LAMRSH="ssh -1 -x" # Force the use of the v1 protocol export LAMHOME PATH MANPATH LAMRSH if [ $interactive -eq 1 -a -r $LAMHOME/etc/news -a "$lamnews_displayed" != "y" ] then cat $LAMHOME/etc/news export lamnews_displayed=y fi
fi
Next, perform the following steps to setup the ssh keyfile. $ cd $HOME/.ssh $ ssh-keygen -t rsa1 When asked the filename, take the default (~/.ssh/identity) Enter a passphrase. This is going to be a password for your account, so you can choose your current password, or another separate one (The SSH people suggest a different one, but they are very security conscious.) $ cat identity.pub >> authorized_keys (For those, who want the details of what is going on: -> This set the public key of the identity file as allowed to login. It is read at the SSHd server end to find out the key to use for the authentication challenge. The password to the identity file is used to unlock the key, which will be used to the decode the challenge so that you can send a correct response.) Just in case SSH v2 spontaneously starts working on the workstations, you can also create a version 2 identity keyfile: $ ssh-keygen -t dsa and/or $ ssh-keygen -t rsa $ cat id_dsa_key >> authorized_keys2 and/or $ cat id_rsa_key >> authorized_keys2 There are man pages for all these commands (ssh, ssh-agent, ssh-add, ssh-keygen) if you would like to know more about them. Now, test it: $ ssh borg You should be asked for the _passphrase_ for a filename instead of login password. If that works you are all set for now.
Because different compilers tend to generate different linkage symbols for the same routines/variables (particularly C++ compilers), we have LAM compiled for several different compilers (both C and C++) on every architecture.
As such, the lam_cshrc
script will examine the CC
and CXX
environment variables to determine which compiler you are
using. If the variables do not exist, lam_cshrc
defaults to the
native compiler for that architecture.
NOTE: If you switch compilers for
a given program, you must set the CC
and
CXX
environment variables, and re-source the lam_cshrc
script so that your environment can be re-set.
In your working directory (where your MPI binaries will reside), you will need to create one or more hostfiles which provide a listing of the machines to be included in an MPI session. Note that even if your hostfile contains only a single machine name it is still possible to run and test your parallel code by instructing LAM to schedule all of your processes on your single machine (see the -np # option to mpirun).
Here is an example of a hostfile :
caper cueball cujo serroul blueberry tabasco crispy stop-a
On Borg there are a number of pre-existing host files that you can copy, edit, and use:
% cp /opt/MPI/lam/boot/floor2.bhost . % cp /opt/MPI/lam/boot/floor3.bhost .IMPORTANT: Note, once you copy one of these files over you must edit it and move the name of the machine that you are logged onto to the first place in the list.
NOTE: the setup we described above supports a homogenous environment, that is all of the hosts are sparc machines running Solaris.
All off the Dalhousie Faculty of Computer Science Machines NSF mount the same file system, i.e. your Borg files. Since writing to this file system involves network communication you may not want to us it when working with program input/output files. Each machine has some local space in /tmp. You may use this local space by first creating a directory /tmp/username where username is your Borg username. You may save what ever you like in this directory but be warned its contents will be erased periodically, in particular whenever the machine is rebooted.
We present here a simple C program that passes a message around a ring of processors.
The most simple and straight forward way to compile MPI programs running under the LAM implementation is to modify an existing Makefile. We suggest that you modify the this Makefile to your liking and expand on it as you become more comfortable with LAM.
$ eval `ssh-agent`
$ ssh-add
Now to test if it is all working try
$ ssh borg
it should log you in without asking for a password. You are ready to proceed to lamboot or recon.
Every MPI session must begin with one and only one lamboot
command. To initiate a MPI session under LAM, at the unix prompt %
% lamboot -v hostfile
The -v
is to enable the verbose flag, so that the you can
observe what LAM is doing. Upon a sucessful invocation using the hostfile
listed above, you should see something like (depending on your hostfile
):
LAM /MPI 2 C++/ROMIO - University of Notre Dame Executing hboot on n0 (thomas.helios.nd.edu)... Executing hboot on n1 (austen.helios.nd.edu)... Executing hboot on n2 (larson.helios.nd.edu)... Executing hboot on n3 (twain.helios.nd.edu)... Executing hboot on n4 (trudeau.helios.nd.edu)... topology done
If you have having problems with lamboot
, check the
LAM FAQ under the section "Booting
LAM" for lots of helpful information.
A quick and simple way to check the status of the machines in the current
MPI session is with the tping N
command. tping
can
be very helpful when you suspect that the MPI program hangs or when you are
not observing expected results. The N
command line argument is to
tell LAM to "ping" all of the hosts in the hostfile .
Unless you tell tping
to execute a specific number of times,
it will run until you hit control-c (just like the Unix ping
).
You can use the -c
flag to specify a number pings to execute:
tping -c3 N
.
SPMD
To run your MPI binaries, use the command mpirun
. For example,
to run the sample program presented above (assuming that the binary is called
``hello''),
% cd dir_where_your_code_is % mpirun N hello
The N
means "run it on all the machines in your hostfile."
mpirun
has several options that can be supplied:
-O
This option indicates that all the machines that you
will be running on are homogenous -- no data conversions need to be made.
This is not necessary, but can make your program run faster, since
LAM will not check to see if endian conversions need to be made. N
As stated above, this means "run it on all the machines
in your hostfile." If you want to run it on fewer than all N machines, you
can specify a subset to run them on. For example, n0-n2
specifies to run on the first three machines in the hostfile. -lamd
Use the "daemon based" sending mode. This means that
all messages will go through the LAM daemons rather than directly from MPI
rank to MPI rank (the so-called "client-to-client" mode, which is the
default for LAM/MPI). The LAM daemons can provide extensive monitoring
capabilities. See the LAM FAQ for a longer discussion of this issue; there
are two relevant questions about this in the
Running LAM/MPI
Applications section, "What's the difference between "daemon" and "C2C"
mode?", and "Why would I use "daemon" over "C2C" mode? Be Conservative!
tping -c3 N
. to check it is workingMPMD
Although it is common to write SPMD code, LAM can also handle the MPMD style of executing programs as well (i.e., execute different binaries on each rank).
Instead of giving mpirun
the name of a single binary, you give
mpirun
the name of an application schema file. The
application schema (or "appschema") simply lists the nodes that you want to
use, and the name of the binary to execute on each (along with any relevant
command line options that your binary may require).
For example, the following appschema starts master
on n0
,
and starts slave
on all the other nodes (n1-7
, in
this case). Note that we're passing some flags to the slave
program, too:
n0 master n1-7 slave -verbose -loadbalance
To run this appschema, you still use mpirun
, but no longer
need to specify nodes or an application name -- you simply specify the
appschema file name (let's say that the above example's file name is
esha-homework
):
% mpirun esha-homework
This will start the respective binaries on their respective nodes.
Anologous to the sequential UNIX ps
command is mpitask
which displays the current status of the MPI program(s) being executed.
The -h
command line option provides brief synopsis for this
command.
Similar to the mpitask
command, the mpimsg
command gives information about running MPI programs. mpimsg
shows all pending messages in the current MPI environment. With mpimsg
, you can see messages that are "left over" (i.e. messages that are
never received) even after your MPI program has completed.
This command is not very useful if you are running in the
"client-to-client" mode in LAM/MPI (which is the default). You must
specifically say -lamd
on your mpirun
command line
for this command to work as expected.
REMEMBER: Correct MPI programs do not leave messages lying around; all messages should be received during the run of your program.
To kill the running MPI program and erase all pending messages, use
lamclean
:
% lamclean -v
NOTE: lamclean
should only have to be used
for debugging -- i.e. programs that hang, messages that are left around, etc.
Correct MPI programs should terminate properly and clean up all their
messages.
It is extremely important that each MPI session must be shutdown using the
LAM shutdown command wipe
. To properly shutdown the current MPI
session:
% lamhalt
Unless some of your LAM/MPI nodes have crashed, this will successfully shut
down LAM/MPI and return to a command prompt. If lamhalt
hangs and
does not return to a command prompt, you will need to use the wipe
command:
% wipe -v hostfile
You should observe that all of the hosts in your hostfile are killed. If you use the hostfile listed above, you should see:
LAM 6.5.6 - University of Notre Dame Executing tkill on n0 (thomas.helios.nd.edu)... Executing tkill on n1 (austen.helios.nd.edu)... Executing tkill on n2 (larson.helios.nd.edu)... Executing tkill on n3 (twain.helios.nd.edu)... Executing tkill on n4 (trudeau.helios.nd.edu)...
IMPORTANT: Please note that it is
extremely important that you must terminate each MPI session with the
lamhalt
and/or wipe
commands because leftover daemons will
cause unpredictable results and possibly even crash the machines on
subsequence MPI sessions. This is highly demonstrative of poor and unsocial
computing practice.
You should also shutdown your ssh-agent. Otherwise, anyone sitting down at the machine might be able to log into all the FCS machines as you without a password. (If you look at the SSH_AUTH_SOCK file it is owned by you and only readable by you, so it should be safe, but why take the chance?)
$ ssh-agent -k
The Dalhousie Unix machines that you are using are a shared resource. Unless everyone behaves, the machines become unstable . Please clean up after yourself! Make sure you leave these machines as you found them!
- Be sure to halt and wipe your cluster before you logoff. Check that the lamd process is really dead on all machines you have been using! If some of the lamd processes are still around try:
caper$ user=`whoami`; for host in `grep -v '^#' floor3.bhost`; do echo $host; ssh $host "ps -fu $user; skill lamd; sleep 2; ps -fu $user"; done
- Be sure to cleanup any files that you have left in the /tmp directory.
- Be sure that your program closes all files correctly otherwise the machines will run out of file handles and crash.
caper$ wget http://www.cs.dal.ca/~arc/resources/MPI/sampleCode//MPI_C_SAMPLE.c
caper$ wget http://www.cs.dal.ca/~arc/resources/MPI/sampleCode//makefile
caper$ gmake
caper$ cp /opt/MPI/lam/boot/floor3.bhost .
[Edit floor3.bhost to make sure present machine is first ]
caper$ eval `ssh-agent`
caper$ ssh-add
[Enter passphrase for testuser@borg: foobar]
caper$ recon -v floor3.bhost
caper$ lamboot -v floor3.bhost
caper$ tping -c3 N
caper$ mpirun N hello
Enter the number of times around the ring: 2
Process 0 sending 2 to 1
Process 0 received 2
Process 0 decremented num
[....]
caper$ lamhalt
caper$ ssh-agent -k