Grid Engine can work as a batch job submission system which will find the computing resources you need and run your job delivering the results to you in several possibly ways. It can also coordinate your MPI jobs, picking a reasonable set of machines to run on. Further it can do parallel makes or give you a prompt on some machine that is not busy. If you write you job to be able to receive certain signals and act on them, then you can also have your job checkpoint periodically and be able to be stopped and restarted on another machine if the machine it is running on becomes busy.
In theory it can also operate in a heterogenous environment, but that will have to wait until there are non-Solaris Unix machines integrated into the "cell".
SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
if [ -r ${SGE_SETTINGS} ]
then
source ${SGE_SETTINGS}
fi
Here is it is in the context of my .bashrc:
# setup packages
#
#export debug=1
PACKAGES="$SYSPKG:matlab"
if \type setup_packages > /dev/null 2>&1
then
setup_packages
fi
SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
if [ -r ${SGE_SETTINGS} ]
then
source ${SGE_SETTINGS}
fi
#
# setup login session
#
if [ $interactive -eq 1 ]
then
After you have added it, you will need to either logout and
login again, or run source /opt/gridware/sge/default/common/settings.sh.
On startup, qmon gives a block of buttons to access the various functions of the system. To start, you should choose the first button, in the upper left corner, which is the "Job Control" button. From here you can see the list of jobs currently scheduled, and "Submit" your own jobs.
Let us start with a simple example:
#!/bin/sh
#
#
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.
# This is a simple example of a SGE batch script
# request Bourne shell as shell for job
#$ -S /bin/sh
#
# print date and time
date
# Sleep for 20 seconds
sleep 20
# print date and time again
date
Put this in a file called "simple.sh".
From torch, or any other workstation execute
qsub simple.sh
You will receive notification that the job is accepted
your job 176 ("simple.sh") has been submitted
Now you can wait for it to complete, or watch for it to start
using:
qstat
or for a graphical interface
qmon
(which allows job submission too).
Once your job is finished, you will have files in your home directory called
For example, should you prefer to isolate the output files in a specific directory, or alter the names they receive, you can use the "-o filename" parameter. If you want the stdout and stderr streams in the same file, you can use "-j y", and if you prefer "/bin/sh" to "/bin/csh" as the interpretter of your job script, you can use "-S /bin/sh".
Written out at the top of a job script this would look like:
# request "/bin/sh" as shell for job
#$ -S /bin/sh
#$ -j y
# Output to sgeout.xxxxx
#$ -o $HOME/tmp/sge_output/sgeout.$JOB_NAME-$JOB_ID.$HOSTNAME
# eMail at (b)eginning, (e)nd of job to (insert_your_username_here)@cs.dal.ca
#$ -m b
#$ -m e
#$ -M myusername@cs.dal.ca
# The job is named MySimpleJob
#$ -N MySimpleJob
(Just a note: you must actually have a directory ~/tmp/sge_output before this will work.
Otherwise your job will mysteriously sit in the queue and never get run. If
this happens, you can try using the "Why?" key in qmon's Jobs display.)
You could also have set these parameters graphically in the qmon Job Submission screen.
Here is a example:
#
#$ -p -10
#$ -l arch=solaris64
#$ -l virtual_free=32M
# Note that if you specify h_vmem, it will ulimit you to that.
#$ -soft -l s_vmem=64M
# Needs matlab
#$ -l matlab=1
MPI (Message Passing Interface) is a facility that allows programs running on different localities (eg separate machines, or different CPUs in a multi-CPU machine) to communicate with each other. Normally, one would have to create a static list of machines you would like your program to run on. GridEngine will dynamically generate a list of machines that are not in use and pass this to your program to execute on.
SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
if [ -r ${SGE_SETTINGS} ]
then
source ${SGE_SETTINGS}
fi
Here is it is in the context of my .bashrc:
# setup packages
#
#export debug=1
PACKAGES="$SYSPKG:matlab"
if \type setup_packages > /dev/null 2>&1
then
setup_packages
fi
SGE_SETTINGS=/opt/gridware/sge/default/common/settings.sh
if [ -r ${SGE_SETTINGS} ]
then
source ${SGE_SETTINGS}
fi
MPIR_HOME=/opt/MPI/mpich
export MPIR_HOME
PATH="$MPIR_HOME/bin:$PATH"
#
# setup login session
#
if [ $interactive -eq 1 ]
then
Now, let us compile MPI hello world, (with a sample Makefile). (Download the files, put them in a directory, and run "make", which will create an executable called hello.) (For a more detailed tutorial, using LAM, see Dalhousie MPI Tutorial.)
There are some things to note:
As with the simple.sh program,
qsub mpi_hello_job.sh
As a note, changing the MPI_PROGRAM variable to
MPI_PROGRAM=$MPIR_HOME/example/cpi
will calculate Pi using the cpi program distributed with MPICH.