Sun Grid Engine at Dal FCS QuickStart

Theory

The idea of Sun Grid Engine is that you should not need to know exactly what the details of where your job is running, you should just say what it needs and let it happen. (The idea is you should be able to plug your program into the computing resources like you plug your lamp into the electrical grid.)

Grid Engine can work as a batch job submission system which will find the computing resources you need and run your job delivering the results to you in several possibly ways. It can also coordinate your MPI jobs, picking a reasonable set of machines to run on. Further it can do parallel makes or give you a prompt on some machine that is not busy. If you write you job to be able to receive certain signals and act on them, then you can also have your job checkpoint periodically and be able to be stopped and restarted on another machine if the machine it is running on becomes busy.

In theory it can also operate in a heterogenous environment, but that will have to wait until there are non-Solaris Unix machines integrated into the "cell".

Setup

You will need to import the variable for this installation into you shell environment. Edit your ~/.bashrc to include:


        SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
        if [ -r ${SGE_SETTINGS} ]
        then
            source ${SGE_SETTINGS}
        fi

Here is it is in the context of my .bashrc:


        # setup packages
        #
        #export debug=1
        PACKAGES="$SYSPKG:matlab"
        if \type setup_packages > /dev/null 2>&1 
        then
            setup_packages
        fi

        SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
        if [ -r ${SGE_SETTINGS} ]
        then
            source ${SGE_SETTINGS}
        fi

        #
        # setup login session
        #
        if [ $interactive -eq 1 ]
        then

After you have added it, you will need to either logout and login again, or run source /opt/gridware/sge/default/common/settings.sh.

"qmon": Graphical Access to the System

The rest of this guide talks about command line versions of the various commands, mostly because understanding the command line options allows you to place the information necessary to correctly submit your job actually in your script. However, it is probably easier for the beginner to use the program "qmon" to access the various functions. From this program you can look at the state of the system's queues, what jobs are pending, running or completed, and submit jobs by ticking off boxes for the options desired.

On startup, qmon gives a block of buttons to access the various functions of the system. To start, you should choose the first button, in the upper left corner, which is the "Job Control" button. From here you can see the list of jobs currently scheduled, and "Submit" your own jobs.

"qsub"/"qmon": The Job Submit Program

To submit jobs to the system, you run "qsub". You must submit shell (csh, Bourne shell, etc.) scripts, not executables. You can place specifications on the command line for what resources you need, what you job can do, etc., but you can also add these parameters inside your script so that you do not have to remember what command line parameters to use each time.

Let us start with a simple example:


#!/bin/sh
#
#
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.  

# This is a simple example of a SGE batch script

# request Bourne shell as shell for job
#$ -S /bin/sh

#
# print date and time
date
# Sleep for 20 seconds
sleep 20
# print date and time again
date

Put this in a file called "simple.sh". From torch, or any other workstation execute
qsub simple.sh
You will receive notification that the job is accepted
your job 176 ("simple.sh") has been submitted

Now you can wait for it to complete, or watch for it to start using:
qstat
or for a graphical interface
qmon
(which allows job submission too).

Once your job is finished, you will have files in your home directory called

sample.sh.e176: The output to stderr of your script
sample.sh.o176: The output to stdout of you script

Adjusting Submission Properties

qsub has a wide range of command line parameters that can adjust behaviour of your submitted job. All of these are available as check and text boxes in the graphical interface "qmon". You can not run man qsub for documentation of these various parameters you can use during job submission. Rather than having to type all these parameters on the command line (or adjust the check boxes in qmon) each time you submit the job, each of these parameters can be included in the script, by placing a line starting with the sequence "#$ (option)" in your script.

For example, should you prefer to isolate the output files in a specific directory, or alter the names they receive, you can use the "-o filename" parameter. If you want the stdout and stderr streams in the same file, you can use "-j y", and if you prefer "/bin/sh" to "/bin/csh" as the interpretter of your job script, you can use "-S /bin/sh".

Written out at the top of a job script this would look like:


# request "/bin/sh" as shell for job
#$ -S /bin/sh
#$ -j y
# Output to sgeout.xxxxx
#$ -o $HOME/tmp/sge_output/sgeout.$JOB_NAME-$JOB_ID.$HOSTNAME

#  eMail at (b)eginning, (e)nd of job to (insert_your_username_here)@cs.dal.ca
#$ -m b
#$ -m e
#$ -M myusername@cs.dal.ca

#  The job is named MySimpleJob
#$ -N MySimpleJob

(Just a note: you must actually have a directory ~/tmp/sge_output before this will work. Otherwise your job will mysteriously sit in the queue and never get run. If this happens, you can try using the "Why?" key in qmon's Jobs display.)

You could also have set these parameters graphically in the qmon Job Submission screen.

Request a Specific Environment

Let us say that you have detailed knowledge of your job's requirements, requirements that might not be available on some of the machines in the "complex" (set of machines in the gridengine system). You can specify these requirements also when you are submitting your job.

Here is a example:

working data set that is going to be loaded into memory is at least 32M, but probably not more than 64M.
the code was compiled for 64bit Solaris
needs Matlab, and will consume only 1 license

As an additional detail, you have decided that you want your process to run at a lower priority so that it does not interfer with interactive performance in case your job happens to be running on a machine that someone has logged into.


#
#$ -p -10
#$ -l arch=solaris64
#$ -l virtual_free=32M

#  Note that if you specify h_vmem, it will ulimit you to that.
#$ -soft -l s_vmem=64M

#  Needs matlab
#$ -l matlab=1

Array (Parameterized) Jobs

GridEngine has the ability to launch a series of optionally dependent jobs with one submission request. This facility is called "Array Jobs". To submit an array job, add the parameter "-t n-m:s" [n=low, m=high, s=step] to the qsub command. This launches the same job script altering only the task identifier. Your job script finds out its array index by consulting the SGE_TASK_ID environment variable. They can then use that index to decide what data set they will work on. This only provides one dimension. To achieve multiple dimensions in your data set simply use shell script loops to submit a series of array jobs varying the second parameter. Here are some examples:

Sun's example demonstrating job dependencies: array_submitter.sh, which calls step_A_array_submitter.sh and step_B_array_submitter.sh as the dependent job.
Job script for running SETI@Home client: sge-start-seti.sh
Pair of scripts to probe an (x, y) problem space: submitter (varies x), job script (varies y)

MPI

Note that currently LAM has not been configured to operate with GridEngine (or vice versa), so you must use MPICH for now.

MPI (Message Passing Interface) is a facility that allows programs running on different localities (eg separate machines, or different CPUs in a multi-CPU machine) to communicate with each other. Normally, one would have to create a static list of machines you would like your program to run on. GridEngine will dynamically generate a list of machines that are not in use and pass this to your program to execute on.

Compile with MPICH

By default you likely have LAM in your PATH environment setting. Since LAM and MPICH share a common set of commands to compile and run your program, you will have to arrange for MPICH to come first in your path.


        SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
        if [ -r ${SGE_SETTINGS} ]
        then
            source ${SGE_SETTINGS}
        fi

Here is it is in the context of my .bashrc:


        # setup packages
        #
        #export debug=1
        PACKAGES="$SYSPKG:matlab"
        if \type setup_packages > /dev/null 2>&1 
        then
            setup_packages
        fi

        SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
        if [ -r ${SGE_SETTINGS} ]
        then
            source ${SGE_SETTINGS}
        fi


	MPIR_HOME=/opt/MPI/mpich
	export MPIR_HOME
	PATH="$MPIR_HOME/bin:$PATH"


        #
        # setup login session
        #
        if [ $interactive -eq 1 ]
        then

Now, let us compile MPI hello world, (with a sample Makefile). (Download the files, put them in a directory, and run "make", which will create an executable called hello.) (For a more detailed tutorial, using LAM, see Dalhousie MPI Tutorial.)

Submit the job

Next, we must ask GridEngine to run the job. This script is slightly more complicated, so it is available for download: mpi_hello_job.sh.

There are some things to note:

#$ -pe mpich n-m: This selects the Parallel Environment mpich, with between n and m parallel jobs (eg 4 means 4 nodes in the MPI environment, and 2-6 means the job can run on between 2 and 6 nodes, as available).
#$ -vMPIR_HOME=...,COMMD_PORT,P4_RSHCOMMAND=rsh: This passes the MPIR_HOME environment variable on the to job being run so that the mpirun program can be found, it also says that the COMMD_PORT should be passed on to subprocesses (so that MPI can find its peers), and that the system should use rsh to launch the remote jobs.
PATH="$TMPDIR:$PATH": GridEngine tricks the MPI environment to go through GE to launch its peers so that GE can track the resource usage, etc of the remote job. To do this the temporary GridEngine version rsh must come before the system version in the PATH.

You will have to make some adjustments at the top:

Adjust the variable MPI_PROGRAM to point to where you compiled hello.
Adjust the "#$ -o $HOME/tmp/sge_out/..." line to match where you would like your output to be stored.

Make it go

As with the simple.sh program,
qsub mpi_hello_job.sh

As a note, changing the MPI_PROGRAM variable to
MPI_PROGRAM=$MPIR_HOME/example/cpi
will calculate Pi using the cpi program distributed with MPICH.