[Introduction | Preliminary setup at Ace-net | Compilers |Multithread with OpenMP
|MPI library | Running
jobs | Debugging
with Totalview | FAQs ]
This document is a customized version of the "User Guide" tutorial for using OpenMP and Open MPI at ACEnet network. The document is organized into following sections:
á
Preliminary setup
at ACEnet (only needs
to be executed once)
á
Multithread
with OpenMP
á
Running
jobs
á
Debugging
with Totalview
á
FAQs
This document will not cover in details the ethical and proper computing etiquettes relating to the use of public clusters for parallel computing. As a result, users are requested to consider the affects of the improper use of OpenMP or Open MPI.
The Atlantic Computational Excellence Network (ACEnet) is a pan-Atlantic High Performance Computing (HPC) consortium providing distributed HPC resources, visualization and collaboration tools to participating research institutions. Currently, The ACEnet hardware resources are located at several universities and include the following clusters:
á
Brasdor (brasdor.ace-net.ca
)
at StFX
á
Fundy (fundy.ace-net.ca
)
at UNB
á
Mahone (mahone.ace-net.ca
)
at Saint Mary's
á
Placentia (placentia2.ace-net.ca
)
at MUN
á
Glooscap (glooscap.ace-net.ca
)
at Dal
á
Courtenay (courtenay.ace-net.ca
)
at UNBSJ
á
More
information is available here.
First, you
need to go to http://www.mun.ca/acenet/applications/ to apply
for an ACEnet user account using the Project Account
Number provided by Prof. Rau-Chaplin. Your
account grants you access to all of the ACEnet clusters with the same
username
and password. When you log in to a particular cluster, you log in to
the head
node of this cluster, where you can edit, compile and test your code.
All the
communication must be performed over the SSH network protocol using an
SSH
client. If you are using a Unix-like machine, you can ÒsshÓ
from the command prompt. On Windows systems, we suggest
that you
download the freely available client PuTTY. For example, if you want
to access the Brasdor cluster using the
command line from a Unix-like system, you would type
ssh -X username@brasdor.ace-net.ca
or alternately
ssh -X brasdor.ace-net.ca -l username
where the optional -X
flag enables X11
connection forwarding.
user@mahone: ~ $ ssh user@fundy.ace-net.ca
Password:
Last login: Tue Dec 11 13:35:25 2007 from 140.184.24.8
user@fundy: ~ $
The first time you connect to an ACEnet machine via SSH, you will see a message like the following:
The authenticity of host 'fundy.ace-net.ca (131.202.246.6)' can't be established.
RSA key fingerprint is ee:28:46:48:78:68:e3:28:ad:45:28:fe:c2:14:0c:d8.
Are you sure you want to continue connecting (yes/no)?
This is expected and you are
safe to answer yes
. You will
then see a message
Warning: Permanently added 'fundy.ace-net.ca' (RSA) to the list of known hosts.
After connecting to the machine, you will be prompted for your credentials. Once you have logged in you should change your initial password. The command to do this is simply
passwd
You will be prompted for your current password and your new password (need advice on choosing a good password? Click here). Within minutes, your password change will be replicated across ACEnet.
The best way to transfer files to and from the cluster is to use a program that supports SFTP (SSH File Transfer Protocol). SFTP is similar to regular FTP, however instead of sending your data in a readable plain-text format, SFTP encrypts the traffic. The commands for SFTP are the same as FTP. It is available from the command line on most Unix-like systems. Mac OS X users: for a graphical SFTP client, check out Cyberduck. Windows users can also use a program similar to PuTTY called PSFTP, or get WinSCP for a graphical interface similar to Windows Explorer.
Command-line SFTP programs and PSFTP are similar to connecting via SSH. You can initiate a file transfer session with the following syntax
sftp user@fundy.ace-net.ca
You will be prompted for your password and, upon successful authentication, will see an interactive SFTP prompt.
user@mahone: ~ $ sftp user@fundy.ace-net.ca
Connecting to fundy.ace-net.ca...
Password:
sftp>
Type help
at this prompt to
see a list of available commands.
The recommended and default
login and Grid Engine shell is bash.
If you
want to change your shell from tcsh
to bash
or vice versa then
please contact ACEnet support.
The commands /bin/bash
and /bin/sh
reference the same executable, which behaves a bit
differently
depending on the name it's invoked with, in order to mimic the
behaviour of
historical versions of sh
. Bash
is a UNIX shell
written for the GNU Project.
There are two files for bash
or sh
that you should
have in your home directory: .bashrc
and .profile
.
The content of the default user .bashrc
file
# Load default ACEnet cluster profile
if [ -f /usr/local/lib/bashrc ]; then
. /usr/local/lib/bashrc
fi
#
# Add your settings below
#
The content of the default user .profile
file
# Do not delete or change this file
[[ -f ~/.bashrc ]] && . ~/.bashrc
For C shells (csh or tcsh), more information is
available here.
Note: Paswordless SSH access within
the
cluster is already configured in your account at all sites except Placentia.
Grid Engine relies on SSH to start job processes.
If you want to configure passwordless SSH access yourself then you have to generate an SSH key with the following set of commands:
$ ssh-keygen -t rsa
(hit enter three times or answer 'y')
$ cd ~/.ssh
$ cp id_rsa.pub authorized_keys
$ chmod 600 authorized_keys
If you want to set passwordless
SSH between different sites then you need to copy three files id_rsa
, id_rsa.pub
and authorized_keys
to other clusters to the~/.ssh
directory. For
example, to copy these files to the Fundy cluster,
type the following:
$ cd ~/.ssh
$ scp id_rsa id_rsa.pub authorized_keys fundy.ace-net.ca:.ssh/
Several compiler suites are currently available:
á Portland Group compilers (preferred compilers)
á
Sun
Studio 12 compilers
á
GNU
compilers (gcc 3 and gcc
4)
Description
Portland Group compilers 8.0.1 for FORTRAN, C, C++ and High-Performance Fortran
Resources
Documentation,
Tips & Techniques, etc.
Commands
pgcc
pgCC
pgf77
pgf90
pgf95
pghpf
Help
for any command
man pgf90
pgf90 -help
pgf90 -flags
Description
Sun Studio 12 compiler for Fortran, C, C++
Resources
Commands
cc
CC
f77
f90
f95
Help
for any command
man f90
f90 -flags
Notes
Other features include: Garbage Collector, IDE, Performance Analyzer, X-Designer
Description
The GNU Compiler Collection is a set of
programming
language compilers produce by the GNU Project and distributed by the
Free
Software Foundation. Languages include C (gcc),
C++
(g++), Fortran (g77), Ada (gnat), and Java
(gcj).
Version
á 3.4.6
á
4.1.2
with OpenMP support
Resources
Commands
for gcc3
gcc
g++
g77
gnat
gcj
Commands
for gcc4
gcc4
g++4
gfortran
Help
for any command
man gcc
man gfortran
gcc --help
Notes
Please note that Red Hat Linux has not been updated yet at some sites where you can still find gcc<4.1 with no support for OpenMP. We anticipate that the upgrade will happen very soon.
The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and FORTRAN on all architectures, including UNIX platforms and Windows NT platforms. Jointly defined by a group of major computer hardware and software vendors, OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.
OpenMP (Open Multi-Processing) is a standard for building parallel applications on shared-memory computers (multiprocessors). It consists primarily of a set of compiler directives, with some library routines besides. OpenMP is supported by PGI, Sun Studio and gcc4 compilers.
The maximum
number of threads
available for OpenMP jobs at ACEnet is either 4 or 16 depending on the
cluster.
Notice that the requested number of threads is communicated to the
program at
execution time with the environment variable OMP_NUM_THREADS
.
For example, if you want your program running on four cores use the
following
in bash
export OMP_NUM_THREADS=4
or in csh
:
setenv OMP_NUM_THREADS 4
It may be a good idea to include one of these declarations into your shell profile to have the variable set as soon as you log in.
In order to force the compiler to interpret the OpenMP directives in the source code, you need to specify appropriate flags during the compilation, otherwise a serial code will be generated.
PGI
compilers |
|
Sun
Studio 12 |
|
GNU
4 |
|
The good
introduction to
OpenMP can be found here.
Notes
Please note that Red Hat Linux has not been updated yet at some sites where you can still find gcc<4.1 with no support for OpenMP. We anticipate that the upgrade will happen very soon.
You can monitor the parallel execution of your programs using
á mpstat
We present here a simple hello world C program using PGI compilers
$ pgcc
-mp -o hello omp_hello.c
$ export OMP_NUM_THREADS=4
$ ./hello
Hello parallel world!
Number of threads is 4
Hello world from
thread 3
Hello world from
thread 0
Hello world from
thread 1
Hello world from
thread 2
Back to the sequential
world.
$
MPI is suitable for parallel
machines such as the IBM SP, SGI Origin,
etc., but it also works well in clusters of workstations. Taking
advantage of
the availability of the clusters of workstations at Dalhousie, we are
interested in using MPI as a single parallel virtual machine with
multiple
nodes.
The default (and preferred) MPI implementation at ACEnet is Open MPI. It's free, open source, production-quality MPI-2 implementation. In some rare cases you may still need MPICH; however please note that support for this library will be soon discontinued.
Note: Do not confuse the Open MPI library with OpenMP.
Resources
á
Instructional videos and presentations
Current version installed
Open MPI v1.2.7 is configured with PGI (64-bit).
The Open MPI team strongly recommends
that you simply use Open MPI's
"wrapper" compilers to compile your MPI applications. That is,
instead of using (for example) gcc to compile
your program, use mpicc. Open MPI
provides a wrapper compiler for four languages:
Language |
Wrapper compiler name |
C |
mpicc |
C++ |
mpiCC, mpicxx, or mpic++ |
Fortran 77 |
mpif77 |
Fortran 90 |
mpif90 |
Hence, if you expect to compile
your program as:
shell$ gcc my_mpi_application.c -o my_mpi_application |
Simply use the following instead:
shell$ mpicc my_mpi_application.c
-o my_mpi_application |
Note that Open MPI's wrapper
compilers do not do any actual compiling or
linking; all they do is manipulate the command line and add in all the
relevant
compiler / linker flags and then invoke the underlying compiler /
linker
(hence, the name "wrapper" compiler). More specifically, if you run
into a compiler or linker error, check your source code and/or back-end
compiler -- it is usually not the fault of
the Open MPI wrapper compiler.
We present here a simple C program that passes a message around a ring of processors.
The most simple and straight forward way to compile MPI programs is to modify an existing Makefile. We suggest that you modify this Makefile to your liking and expand on it as you become more comfortable with Open MPI.
andang@fundy.ace-net.ca's password:
Last login: Mon Jun 29 13:53:31
2009 from pcox-imac08.cs.dal.ca
andang@fundy: ~ $ make
mpicc MPI_C_SAMPLE.o
-o MPI_C_SAMPLE
-L./libs
andang@fundy: ~ $ ./MPI_C_SAMPLE
Enter the number of times around the ring: 2
Process 0 sending 2 to 0
Process 0 received 2
Process 0 decremented num
Process 0 sending 1 to 0
Process 0 received 1
Process 0 decremented num
Process 0 sending 0 to 0
Process 0 exiting
To submit the code hello to
the scheduler, which will allocate free computing resources to your
job and run it on one of the computing nodes, you need to create a
small
submission script. With this script you instruct the scheduler where to
execute
the code, where to write the output, and with how many threads you want
your
code to be run. Here is an example of such a script called submit_hello.sh.
#$ -S /bin/bash
#$ -cwd
#$ -j y
#$ -o hello.out
#$ -pe
openmp 4
#$ -l h_rt=01:00:00
export OMP_NUM_THREADS=$NSLOTS
./hello
Finally, to submit the job, type
in the command line
qsub
submit_hello.sh
Your code will be submitted and
eventually run with 4 threads. To check
the status of your code, type
qstat
If the status is qw then the job is waiting in the queue, if
it's then the job is running, if there is nothing then the job has
finished. Now you can check the results. The output should be in the
file hello.out which we specified in the job submission
script.
You should
use the ompi*
parallel
environment for Open MPI jobs.
There is no need to specify the
list of hosts and the number of processes for the mpirun
command because
Open MPI will obtain this information directly from
Sun Grid Engine.
#$ -S /bin/bash
#$ -cwd
#$ -N test_parallel
#$ -j y
#$ -o test_parallel.log
#$ -l h_vmem=1G
#$ -l h_rt=01:00:00
#$ -pe ompi* 4
mpirun MPI_C_SAMPLE
Save the script to Ò<job_script.sh>Ó and run the job with the following command:
qsub <job_script.sh>
along with necessary options. The submission script is a handy and flexible tool for setting these options and passing them to the scheduler along with the job name, though it's not strictly required. The typical job submission scripts suited for different types of jobs can be found on the Job control page.
The login node or "head node" on each cluster is intended for managing jobs and files, not for significant computing. As a guideline, any process run on the head node should not consume more than 15 minutes of CPU time. Note that this is not the same as 15 minutes of elapsed time: Login sessions, for example, may last arbitrarily long, but consume little CPU.
All longer jobs must be
submitted to the compute hosts via the scheduler, which manages the
available
resources and assigns them to the waiting jobs. The scheduler used on
all ACEnet
clusters is Sun Grid Engine (SGE), which is also known as the N1 Grid
Engine
(N1GE).
Interactive testing on the
head
node
$ mpirun -np 4 my_parallel_application
Interactive session through
Sun
Grid Engine
$ qrsh -cwd -V -l h_rt=00:10:00,test=true -pe ompi\* 4 my_parallel_application
Please refer to the Job control wiki page for detailed information on how you can manage your jobs. Also, check out the commands qsum and showq.
The TotalView debugger can be used for debugging both serial and parallel (MPI, OpenMP) applications.
However,
parallel program users will find TotalView extremely useful due to its
focus on
multi-processor programs debugging. It contains both a graphical and a
command
line interface; and it includes several features for MPI and OpenMP
debugging.
The TotalView debugger can be used for debugging both serial and parallel (MPI, OpenMP) applications.
However,
parallel program users will find TotalView extremely useful due to its
focus on
multi-processor programs debugging. It contains both a graphical and a
command
line interface; and it includes several features for MPI and OpenMP
debugging.
8.6.2-2
Glooscap, Placentia, Mahone, Fundy, Courtenay
TotalView
Support, Documentation, Video Tutorials, Tips & Tricks
TotalView Tutorial from Lawrence Livermore National
Laboratory
Open MPI
FAQ: How do I run with the TotalView parallel debugger?
In order to provide necessary
symbolic debug information for a debugger, you need to recompile your
code.
Usually, this requires the -g
flag to your
compiler.
mpif90 -g -o test test.f90
á
Graphical
Interface: totalview
Command
Line
Interface: totalviewcli
If you want to use the GUI-based
TotalView parallel debugger then you need to make sure that you are
connecting
to the head node of the cluster with the X11 forwarding enabled in your
SSH
client. That will allow you to get windows of a remotely started
application
shown on your own desktop. Unix users need to run the X11 server on
their
desktops (if you are running any window manager then you already have
the X11
server installed) and connect to the head node with the -Xoption
for the SSH client (ssh -X servername.ace-net.ca
). Windows users need to install XMing and connect with
the PuTTY program with X11
forwarding enabled.
Before you start debugging with
the TotalView parallel debugger you will need to create a file in your
home
directory named $HOME/.tvdrc
with the following content:
source /usr/local/openmpi/etc/openmpi-totalview.tcl
This will configure TotalView
to skip mpirun
and jump right
into your MPI application; otherwise it will stop
deep in the machine code of mpirun
itself, which is
not what most users want.
You can use the Totalview
debugger either on the head node, or through the grid engine
interactive queues
(Placentia and Courtenay do not support
debugging through the queues yet). To debug a job you
just need to include --debug
in the command
line. Open MPI will automatically invoke TotalView to
run your MPI process.
If your application is not computationally intensive, does not use a lot of memory, and you are running debugging sessions for short periods of time with a small number of processes (no more than 4), then you can debug your program on the head node.
mpirun --debug -np 4 my_parallel_application
If your debugging sessions do not qualify to run on the head node, then you need to use dedicated test.q resources, which allow to run a job for less than 1 hour. This option is available at the following sites: Mahone, Fundy, Glooscap. Depending on the cluster, you can request up to 8 slots/processes from Grid Engine.
qrsh -V -cwd -pe ompi 4 -l h_rt=00:30:00,test=true mpirun --debug myapplication
If you are debugging large
jobs, and require more than 4-8 processes for your job, then you can
request
free slots for an interactive job in the production short.q
queue. If free resources are
available, they will be granted to you.
qrsh -V -cwd -pe ompi 20 -l h_rt=00:30:00 mpirun --debug myapplication
Why can't I log in?
First, check WAVELETS and the front wiki page to ensure that the machine is not in a scheduled maintenance outage. Sometimes during such an outage the machine may present a login prompt but refuse to recognize your credentials.
If that's not the problem,
email support at ace-net.ca
Why doesn't my job start right
away?
This could be for a variety of
reasons. When you submit a job to the N1 Grid Engine you are making a
request
for resources. There may be times when the cluster is busy and you will
be
required to wait for resources. If you use the qstat
command, you may see qw
next to your job.
This indicates that it is in the queue and waiting
to be scheduled. If you see an r
next to your job
then your job is running.
That said, it is often not clear
what resources are missing that are preventing your job from being
scheduled.
Most often it is memory that is in short supply,h_vmem
.
You may be
able to increase your job's likelihood of being scheduled if it
requires only
few resources by reducing the job's memory requirements. For example:
qalter -l h_vmem=500M,h_rt=hh:mm:ss job_id
will reduce the virtual memory
reserved for the job to 500 megabytes. (You must re-supply the h_rt
and any other arguments to -l
when you use qalter
.) The
default values are listed on the Job control page. Note that
for parallel jobs, this h_vmem
request is per process.
The scheduler will only start your job if it can find a host (or hosts)
with
enough memory unassigned to other jobs. You can determine the vmem available on various hosts with
qhost -F h_vmem
or you can see how many hosts have at least, say, 8 gigabytes free with
qhost -l h_vmem=8G
You can also try defining a short time limit for the job:
qalter -l h_rt=0:1:0,other args job_id
imposes a hard run-time limit of 0 hours, 1 minute, 0
seconds (0:1:0). In certain circumstances the
scheduler will be able to
schedule a job that it knows will finish quickly, where it cannot
schedule a longer job.
My job is running well but I
noticed an error message
[cl005:00XXX] ras:gridengine: JOB_ID: YYYYY
This is not an error but a diagnostic message generated to the output file for every Open MPI job, and it contains some useful information:
á
the name of the
shepherd host - cl005
á
the
process
ID (PID) of the mpirun
command
on
the shepherd host - 00XXX
á
the grid engine
job ID - JOB_ID: YYYYY
I get the error message
Open RTE was unable to open the hostfile:
/tmp/XXXXX.1.short.q/machines
Check to make sure the path and filename are correct.
You should not be using the
option -machinefile
in the mpirun
command in your
submission script. Open MPI will obtain all
necessary information directly from Sun Grid Engine.
Check out the typical
submission script here.
My
job was running fine but then it got terminated with the message
[cl0XX:YYYYY] ERROR: A
daemon on node cl0ZZ failed to start as expected.
[cl0XX:YYYYY] ERROR: There
may be more information available from
[cl0XX:YYYYY] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[cl0XX:YYYYY] ERROR: If the
problem persists, please restart the
[cl0XX:YYYYY] ERROR: Grid
Engine PE job
or for Myrinet, with the message
MX:cl0XX:Remote endpoint is
closed, peer=00:60:dd:xx:yy:zz (cl0XX:0)
MX:cl0XX:Remote endpoint is
closed, peer=00:60:dd:xx:yy:zz (cl0XX:0)
or for Ethernet, with the messages
mca_btl_tcp_frag_recv:
readv failed with errno=104
mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
It likely means that the job was
killed because of the run-time limit (h_rt).
Check the run-time of your job
(start_time and end_time)
with the following command: qacct -j <job_id>,
and compare it to the h_rt parameter
in your
submission script.
You can also get these error
messages when your job fails, and some of
the processes die or segfault, and others
lose
communication because of that and have to be killed.
How do I run X
Windows (X11) on Microsoft Windows?
We recommend XMing which is very straightforward to install and easy to get working. Check out the guide here.
Permission denied error
messages
If you see one of the following messages
Permission denied (publickey,password,keyboard-interactive).
Permission denied, please try again.
or
(gnome-ssh-askpass:2810): Gtk-WARNING **: cannot open display:
Permission denied, please try again.
then please check that the passwordless SSH access is configured properly.