LAM / MPI Parallel Computing
MPI Tutorial: Getting started with LAM/MPI at Dalhousie

1. INTRODUCTION

1.1 About this document

This document is a customized version of the "Getting started with LAM/MPI" tutorial for using LAM/MPI at Dalhousie University Faculty of Computer Science. This document describes the steps to preparing an MPI session using LAM/MPI 6.5.6. The document is organized into eight sections:

Introduction

Preliminary setup (only needs to be executed once),

Compiling MPI programs,

Booting LAM/MPI,

Running under LAM/MPI,

Shutting down LAM/MPI.,

Cleaning up after yourself, and

An Example

This document will not cover in details the ethical and proper computing etiquettes relating to the use of public clusters for parallel computing. As a result, users are requested to consider the affects of the improper use of LAM.

1.2 About MPI

MPI is suitable for parallel machines such as the IBM SP, SGI Origin, etc., but it also works well in clusters of workstations. Taking advantage of the availability of the clusters of workstations at Dalhousie, we are interested in using MPI as a single parallel virtual machine with multiple nodes.

The MPI-1 Standard supports the portability and platform independent computing. As a result, users will undoubtedly enjoy the cross-platform development capability as well as heterogenous communication. For example, MPI codes which have been written on the RS-6000 architecture running AIX can be ported to a Sparc arhitecture running Solaris with almost no modification necessary.

1.3 About LAM

LAM is a daemon based implementation of MPI. Initially, the program lambootspawns LAM daemons based on the list of host machines provided by the user. These daemons remains idle on the remote machines until they receive a message to load the MPI binary to begin execution. Bottom line:

You boot LAM, it spawns daemons on your machines.
You execute MPI programs while the LAM daemons exist in the background.
Finally at the conclusion of the MPI session, you shutdown LAM.

2. PRELIMINARY SETUP

2.1 Environment setup

LAM MPI is a daemon based implementation of MPI. It uses a number of specialized commands which are not in most peoples' path. The FCS installation of LAM MPI is in /opt/MPI/lam/..., so you should add these lines to you .bashrc.


   LAMHOME=/opt/MPI/lam
   if [ -d ${LAMHOME} ]
   then
                PATH=$PATH:${LAMHOME}/bin
		MANPATH=$MANPATH:${LAMHOME}/man
		LAMRSH="ssh -1 -x"     #  Force the use of the v1 protocol
                export LAMHOME PATH MANPATH LAMRSH
                if [ $interactive -eq 1 -a -r $LAMHOME/etc/news -a "$lamnews_displayed" != "y" ]
		then
			cat $LAMHOME/etc/news
			export lamnews_displayed=y
		fi

fi

2.1.1 Setting up an SSH identity Next, perform the following steps to setup the ssh keyfile. $ cd $HOME/.ssh $ ssh-keygen -t rsa1 When asked the filename, take the default (~/.ssh/identity) Enter a passphrase. This is going to be a password for your account, so you can choose your current password, or another separate one (The SSH people suggest a different one, but they are very security conscious.) $ cat identity.pub >> authorized_keys (For those, who want the details of what is going on: -> This set the public key of the identity file as allowed to login. It is read at the SSHd server end to find out the key to use for the authentication challenge. The password to the identity file is used to unlock the key, which will be used to the decode the challenge so that you can send a correct response.) Just in case SSH v2 spontaneously starts working on the workstations, you can also create a version 2 identity keyfile: $ ssh-keygen -t dsa and/or $ ssh-keygen -t rsa $ cat id_dsa_key >> authorized_keys2 and/or $ cat id_rsa_key >> authorized_keys2 There are man pages for all these commands (ssh, ssh-agent, ssh-add, ssh-keygen) if you would like to know more about them. Now, test it: $ ssh borg You should be asked for the _passphrase_ for a filename instead of login password. If that works you are all set for now. 2.2 Compiler Because different compilers tend to generate different linkage symbols for the same routines/variables (particularly C++ compilers), we have LAM compiled for several different compilers (both C and C++) on every architecture. As such, the lam_cshrc script will examine the CC and CXX environment variables to determine which compiler you are using. If the variables do not exist, lam_cshrc defaults to the native compiler for that architecture. NOTE: If you switch compilers for a given program, you must set the CC and CXX environment variables, and re-source the lam_cshrc script so that your environment can be re-set. 2.3 Hostfile In your working directory (where your MPI binaries will reside), you will need to create one or more hostfiles which provide a listing of the machines to be included in an MPI session. Note that even if your hostfile contains only a single machine name it is still possible to run and test your parallel code by instructing LAM to schedule all of your processes on your single machine (see the -np # option to mpirun). Here is an example of a hostfile : caper cueball cujo serroul blueberry tabasco crispy stop-a On Borg there are a number of pre-existing host files that you can copy, edit, and use: % cp /opt/MPI/lam/boot/floor2.bhost . % cp /opt/MPI/lam/boot/floor3.bhost . IMPORTANT: Note, once you copy one of these files over you must edit it and move the name of the machine that you are logged onto to the first place in the list. NOTE: the setup we described above supports a homogenous environment, that is all of the hosts are sparc machines running Solaris. 2.4 File System Setup All off the Dalhousie Faculty of Computer Science Machines NSF mount the same file system, i.e. your Borg files. Since writing to this file system involves network communication you may not want to us it when working with program input/output files. Each machine has some local space in /tmp. You may use this local space by first creating a directory /tmp/username where username is your Borg username. You may save what ever you like in this directory but be warned its contents will be erased periodically, in particular whenever the machine is rebooted.

3. COMPILATION UNDER LAM 3.1 Sample MPI program in C We present here a simple C program that passes a message around a ring of processors. 3.2 Makefile The most simple and straight forward way to compile MPI programs running under the LAM implementation is to modify an existing Makefile. We suggest that you modify the this Makefile to your liking and expand on it as you become more comfortable with LAM. 4. BOOTING LAM 4.1 Initialize the SSH keychain $ eval `ssh-agent` $ ssh-add Now to test if it is all working try $ ssh borg it should log you in without asking for a password. You are ready to proceed to lamboot or recon. 4.1 lamboot Every MPI session must begin with one and only one lamboot command. To initiate a MPI session under LAM, at the unix prompt % % lamboot -v hostfile The -v is to enable the verbose flag, so that the you can observe what LAM is doing. Upon a sucessful invocation using the hostfile listed above, you should see something like (depending on your hostfile ): LAM /MPI 2 C++/ROMIO - University of Notre Dame Executing hboot on n0 (thomas.helios.nd.edu)... Executing hboot on n1 (austen.helios.nd.edu)... Executing hboot on n2 (larson.helios.nd.edu)... Executing hboot on n3 (twain.helios.nd.edu)... Executing hboot on n4 (trudeau.helios.nd.edu)... topology done If you have having problems with lamboot, check the LAM FAQ under the section "Booting LAM" for lots of helpful information. 4.2 tping A quick and simple way to check the status of the machines in the current MPI session is with the tping N command. tping can be very helpful when you suspect that the MPI program hangs or when you are not observing expected results. The N command line argument is to tell LAM to "ping" all of the hosts in the hostfile . Unless you tell tping to execute a specific number of times, it will run until you hit control-c (just like the Unix ping). You can use the -c flag to specify a number pings to execute: tping -c3 N. 5. RUNNING LAM 5.1 mpirun SPMD To run your MPI binaries, use the command mpirun. For example, to run the sample program presented above (assuming that the binary is called ``hello''), % cd dir_where_your_code_is % mpirun N hello The N means "run it on all the machines in your hostfile." mpirun has several options that can be supplied: -np # This option runs # many copies of the program on the given nodes. If no nodes are specified, all LAM nodes are considered for scheduling; LAM will schedule the programs in a round-robin fashion,"wrapping around" (and scheduling multiple copies on a single node) if necessary. This option indicates that the specified file is an executable program and not an application schema. -O This option indicates that all the machines that you will be running on are homogenous -- no data conversions need to be made. This is not necessary, but can make your program run faster, since LAM will not check to see if endian conversions need to be made. N As stated above, this means "run it on all the machines in your hostfile." If you want to run it on fewer than all N machines, you can specify a subset to run them on. For example, n0-n2 specifies to run on the first three machines in the hostfile. -lamd Use the "daemon based" sending mode. This means that all messages will go through the LAM daemons rather than directly from MPI rank to MPI rank (the so-called "client-to-client" mode, which is the default for LAM/MPI). The LAM daemons can provide extensive monitoring capabilities. See the LAM FAQ for a longer discussion of this issue; there are two relevant questions about this in the Running LAM/MPI Applications section, "What's the difference between "daemon" and "C2C" mode?", and "Why would I use "daemon" over "C2C" mode? Be Conservative! After using lamboot to start your cluster up always execute: tping -c3 N. to check it is working Always first test you cluster with known working code. Try a simple C program and its Makefile. When testing your program for the first time use a one node cluster! Only after completing these first three steps are you ready to try your code on a multi-node cluster! MPMD Although it is common to write SPMD code, LAM can also handle the MPMD style of executing programs as well (i.e., execute different binaries on each rank). Instead of giving mpirun the name of a single binary, you give mpirun the name of an application schema file. The application schema (or "appschema") simply lists the nodes that you want to use, and the name of the binary to execute on each (along with any relevant command line options that your binary may require). For example, the following appschema starts master on n0, and starts slave on all the other nodes (n1-7, in this case). Note that we're passing some flags to the slave program, too: n0 master n1-7 slave -verbose -loadbalance To run this appschema, you still use mpirun, but no longer need to specify nodes or an application name -- you simply specify the appschema file name (let's say that the above example's file name is esha-homework): % mpirun esha-homework This will start the respective binaries on their respective nodes. 5.2 mpitask Anologous to the sequential UNIX ps command is mpitask which displays the current status of the MPI program(s) being executed. The -h command line option provides brief synopsis for this command. 5.3 mpimsg Similar to the mpitask command, the mpimsg command gives information about running MPI programs. mpimsg shows all pending messages in the current MPI environment. With mpimsg , you can see messages that are "left over" (i.e. messages that are never received) even after your MPI program has completed. This command is not very useful if you are running in the "client-to-client" mode in LAM/MPI (which is the default). You must specifically say -lamd on your mpirun command line for this command to work as expected. REMEMBER: Correct MPI programs do not leave messages lying around; all messages should be received during the run of your program. 5.4 lamclean To kill the running MPI program and erase all pending messages, use lamclean: % lamclean -v NOTE: lamclean should only have to be used for debugging -- i.e. programs that hang, messages that are left around, etc. Correct MPI programs should terminate properly and clean up all their messages. 6. SHUTTING DOWN LAM It is extremely important that each MPI session must be shutdown using the LAM shutdown command wipe. To properly shutdown the current MPI session: % lamhalt Unless some of your LAM/MPI nodes have crashed, this will successfully shut down LAM/MPI and return to a command prompt. If lamhalt hangs and does not return to a command prompt, you will need to use the wipe command: % wipe -v hostfile You should observe that all of the hosts in your hostfile are killed. If you use the hostfile listed above, you should see: LAM 6.5.6 - University of Notre Dame Executing tkill on n0 (thomas.helios.nd.edu)... Executing tkill on n1 (austen.helios.nd.edu)... Executing tkill on n2 (larson.helios.nd.edu)... Executing tkill on n3 (twain.helios.nd.edu)... Executing tkill on n4 (trudeau.helios.nd.edu)... IMPORTANT: Please note that it is extremely important that you must terminate each MPI session with the lamhalt and/or wipe commands because leftover daemons will cause unpredictable results and possibly even crash the machines on subsequence MPI sessions. This is highly demonstrative of poor and unsocial computing practice. You should also shutdown your ssh-agent. Otherwise, anyone sitting down at the machine might be able to log into all the FCS machines as you without a password. (If you look at the SSH_AUTH_SOCK file it is owned by you and only readable by you, so it should be safe, but why take the chance?) $ ssh-agent -k 7. CLEANING UP AFTER YOURSELF The Dalhousie Unix machines that you are using are a shared resource. Unless everyone behaves, the machines become unstable . Please clean up after yourself! Make sure you leave these machines as you found them! Be sure to halt and wipe your cluster before you logoff. Check that the lamd process is really dead on all machines you have been using! If some of the lamd processes are still around try: caper$ user=`whoami`; for host in `grep -v '^#' floor3.bhost`; do echo $host; ssh $host "ps -fu $user; skill lamd; sleep 2; ps -fu $user"; done Be sure to cleanup any files that you have left in the /tmp directory. Be sure that your program closes all files correctly otherwise the machines will run out of file handles and crash. 8. AN EXAMPLE SESSION caper$ wget http://www.cs.dal.ca/~arc/resources/MPI/sampleCode//MPI_C_SAMPLE.c caper$ wget http://www.cs.dal.ca/~arc/resources/MPI/sampleCode//makefile caper$ gmake caper$ cp /opt/MPI/lam/boot/floor3.bhost . [Edit floor3.bhost to make sure present machine is first ] caper$ eval `ssh-agent` caper$ ssh-add [Enter passphrase for testuser@borg: foobar] caper$ recon -v floor3.bhost caper$ lamboot -v floor3.bhost caper$ tping -c3 N caper$ mpirun N hello Enter the number of times around the ring: 2 Process 0 sending 2 to 1 Process 0 received 2 Process 0 decremented num [....] caper$ lamhalt caper$ ssh-agent -k

LAM / MPI Parallel Computing MPI Tutorial: Getting started with LAM/MPI at Dalhousie

1.1 About this document

1.2 About MPI

1.3 About LAM

2.1 Environment setup

2.1.1 Setting up an SSH identity

2.2 Compiler

2.3 Hostfile

2.4 File System Setup

3.1 Sample MPI program in C

3.2 Makefile

4.1 Initialize the SSH keychain

4.1 lamboot

4.2 tping

5.1 mpirun

5.2 mpitask

5.3 mpimsg

5.4 lamclean

LAM / MPI Parallel Computing
MPI Tutorial: Getting started with LAM/MPI at Dalhousie