OpenMP Parallel Computing
Using the Omni Compiler at Dalhousie

Introduction | Compiling OpenMP programs |Test ProgramsI
| Monitoring OpenMP programs | Profiling OpenMP programs |Example Sessions]

1. Introduction

    1.1 About this document

    This document is a customized version of the Omni documentation for using OpenMP at Dalhousie University, Faculty of Computer Science. The document is organized into six sections:

    1. Introduction
    2. How to Compile OpenMP programs
    3. Test Programs
    4. How to Monitor OpenMP programs,
    5. How to Profile OpenMP programs
    6. Example Sessions.
       

NOTE: SUN's Forte tools for compiling and profiling/tuning OpenMP code are now also available. 

1.2 About the Omni OpenMP compiler

The Omni OpenMP compiler is available on Locutus. The Omni OpenMP C compiler is a software suite that translates C and Fortran programs with OpenMP directives into code suitable for compiling with a native compiler linked with the Omni OpenMP runtime library.

OpenMP is designed for Symmetric Multi-Processors (SMP) like Locutus and Borg.  For example, Locutus is a Sun Enterprise 4500 with 3 GB RAM and 8 processors running Solaris 7.  An ideal machine to try out OpenMP programs with! NOTE: You must use ssh to log into Locutus as this machine does not respond to Telnet requests!

2. How to Compile and run your OpenMP program

The command to compile OpenMP C programs is omcc. The command to compile OpenMP Fortran77 programs is omf77.

To run the OpenMP program, just execute the compiled executable on your SMP platform (Locutus). The number of processor (or operating system threads) used is controlled as follows:

3. Test programs

Several OpenMP test programs are included in the "tests" directory. This directory can be found at

        /data/courses/csci6702/Omni-1.4a/tests
A set of example programs including a simple "Helloworld" program got as follows:

locutus% wget www.cs.dal.ca/~arc/resources/OpenMP/example2/example2.tar.gz
locutus% gunzip example2.tar.gz
locutus% tar xvf example2.tar

4. How to Monitor your OpenMP program

You can monitor the parallel execution of your programs using

mpstat

mpstat reports per-processor statistics in tabular form. See man mpstat for details. For a more visual representation try

5. How to Profile your OpenMP program

Sometimes you will have a problem that your OpenMP program does not speed up as you expect. In such case, profiling the program may help you to analyze and solve the problem.

To enable profiling of the execution, turn on profiling by setting OMPC_LOG:

       setenv OMPC_LOG
or if you are using the bash shell
        export OMPC_LOG=1

And run your OpenMP program as usual. For example:

       a.out

When run your program, the log file is created with the name of your program and ".log" extension. For example, if the name of the program is "a.out", the log file "a.out.log" is created. In the log file, several timings of parallel region, barrier and scheduling events are included. The kinds of the events in the log file are defined in "lib/libtlog/tlog.h".

To see the log file, use "tlogview" command:

       tlogview a.out.log

tlogview is a profile visualization tool, which allow you to check several states and events of OpenMP during the execution in the window. By dragging the region in the window by mouse, you can zoom up the selected region to see the status more preciously. For more information and usage of tlogview, check tlogview.

To disable profiling, unset OMPC_LOG environment variable:

       unsetenv OMPC_LOG
or if you are using the bash shell
       unset OMPC_LOG

Note that when profiling is on, the execution is getting slow down due to the overhead for getting the timing for each event. As default, the timer routine uses the "gettimeofday" system call which give wall clock time. The resolution of timing depends on the timer routine. You may replace the timer routine with platform-specific routine in "lib/libtlog/tlog-time.c".

6. Example Sessions

locutus% wget www.cs.dal.ca/~arc/resources/OpenMP/example2/omp_hello.c
locutus% omcc -o hello.exe omp_hello.c
Hello World from thread = 1
Hello World from thread = 6
Hello World from thread = 5
Hello World from thread = 4
Hello World from thread = 7
Hello World from thread = 2
Hello World from thread = 0
Number of threads = 8
Hello World from thread = 3

More example sessions:

locutus:/tmp/Omni-Test$ cp -r /var/http/htdocs/tech_support/Resources/Omni/cg .
locutus:/tmp/Omni-Test$cd cg
locutus:/tmp/Omni-Test/cg$ gmake
cc    -c -o second.o second.c
cc  -o cg cg.c second.o -lm
${OMNI_HOME:=/usr/local}/bin/omcc  -o cg-omp cg.c second.o -lm
Compiling 'cg.c'...
${OMNI_HOME:=/usr/local}/bin/omcc  -o cg-orphan cg-orphan.c second.o -lm
Compiling 'cg-orphan.c'...
cc  -o cg-makedata cg-makedata.c -lm
locutus:/tmp/Omni-Test/cg$ export OMPC_BIND_PROCS=true
locutus:/tmp/Omni-Test/cg$ OMPC_NUM_PROCS=1 ./cg-omp
omp_num_thread=1
omp_max_thread=1
    0    1.3764e-13      9.99864415791401
    1    2.1067e-15      8.57332792032217
    2    2.0809e-15      8.59545103740579
    3    2.0978e-15      8.59699723407375
    4    1.9100e-15      8.59715491517665
    5    2.0295e-15      8.59717443116078
    6    1.8605e-15      8.59717707049128
    7    1.9794e-15      8.59717744406296
    8    1.8638e-15      8.59717749839416
    9    1.8070e-15      8.59717750644093
   10    1.9231e-15      8.59717750764864
   11    1.9795e-15      8.59717750783180
   12    1.8284e-15      8.59717750785982
   13    1.7639e-15      8.59717750786414
   14    1.8498e-15      8.59717750786481
time = 5.178689, 0.345246 (0.000000e+00 - 5.178689e+00)/15, NITCG=25
locutus:/tmp/Omni-Test/cg$ OMPC_NUM_PROCS=2 ./cg-omp
omp_num_thread=1
omp_max_thread=2
    0    1.3879e-13      9.99864415791401
    1    2.1946e-15      8.57332792032217
    2    2.1119e-15      8.59545103740579
    3    2.0654e-15      8.59699723407375
    4    1.9867e-15      8.59715491517665
    5    2.1146e-15      8.59717443116078
    6    1.8785e-15      8.59717707049128
    7    1.8864e-15      8.59717744406296
    8    1.9467e-15      8.59717749839416
    9    1.8364e-15      8.59717750644093
   10    1.8862e-15      8.59717750764864
   11    1.7944e-15      8.59717750783180
   12    1.8616e-15      8.59717750785982
   13    1.8112e-15      8.59717750786414
   14    1.8938e-15      8.59717750786481
time = 2.622956, 0.174864 (0.000000e+00 - 2.622956e+00)/15, NITCG=25
locutus:/tmp/Omni-Test/cg$ OMPC_NUM_PROCS=4 ./cg-omp
omp_num_thread=1
omp_max_thread=4
    0    1.4000e-13      9.99864415791401
    1    2.3034e-15      8.57332792032217
    2    2.0425e-15      8.59545103740579
    3    1.9940e-15      8.59699723407375
    4    1.8712e-15      8.59715491517666
    5    2.0707e-15      8.59717443116078
    6    1.8496e-15      8.59717707049128
    7    1.9984e-15      8.59717744406296
    8    1.9273e-15      8.59717749839416
    9    1.7626e-15      8.59717750644093
   10    1.9777e-15      8.59717750764864
   11    1.8091e-15      8.59717750783180
   12    1.8619e-15      8.59717750785982
   13    1.7448e-15      8.59717750786414
   14    1.7789e-15      8.59717750786481
time = 1.347051, 0.089803 (0.000000e+00 - 1.347051e+00)/15, NITCG=25
locutus:/tmp/Omni-Test/cg$ OMPC_NUM_PROCS=8 ./cg-omp
omp_num_thread=1
omp_max_thread=8
    0    1.3958e-13      9.99864415791401
    1    2.2396e-15      8.57332792032217
    2    2.0384e-15      8.59545103740579
    3    1.9475e-15      8.59699723407375
    4    1.9677e-15      8.59715491517666
    5    2.1662e-15      8.59717443116078
    6    1.9009e-15      8.59717707049128
    7    1.8623e-15      8.59717744406296
    8    1.9301e-15      8.59717749839416
    9    1.9383e-15      8.59717750644093
   10    1.8409e-15      8.59717750764864
   11    1.8889e-15      8.59717750783180
   12    1.7497e-15      8.59717750785982
   13    1.8287e-15      8.59717750786414
   14    1.7180e-15      8.59717750786481
time = 1.841307, 0.122754 (0.000000e+00 - 1.841307e+00)/15, NITCG=25
locutus:/tmp/Omni-Test/cg$ OMPC_BIND_PROCS=false OMPC_NUM_PROCS=8 ./cg-omp
omp_num_thread=1
omp_max_thread=8
    0    1.3958e-13      9.99864415791401
    1    2.2396e-15      8.57332792032217
    2    2.0384e-15      8.59545103740579
    3    1.9475e-15      8.59699723407375
    4    1.9677e-15      8.59715491517666
    5    2.1662e-15      8.59717443116078
    6    1.9009e-15      8.59717707049128
    7    1.8623e-15      8.59717744406296
    8    1.9301e-15      8.59717749839416
    9    1.9383e-15      8.59717750644093
   10    1.8409e-15      8.59717750764864
   11    1.8889e-15      8.59717750783180
   12    1.7497e-15      8.59717750785982
   13    1.8287e-15      8.59717750786414
   14    1.7180e-15      8.59717750786481
time = 21.719778, 1.447985 (0.000000e+00 - 2.171978e+01)/15, NITCG=25
locutus:/tmp/Omni-Test/cg$ OMPC_LOG= OMPC_NUM_PROCS=8 ./cg-omp
log on ...
omp_num_thread=1
omp_max_thread=8
    0    1.3958e-13      9.99864415791401
[...]
time = 5.941640, 0.396109 (0.000000e+00 - 5.941640e+00)/15, NITCG=25
finalize log ...
locutus:/tmp/Omni-Test/cg$ tlogview cg-omg.log
Notice the possibility of dimishing returns going from 4-way to 8-way. Also notice the effect of not binding to the processors on a machine where people are using the machine for compute-heavy operations in the background. It is highly recommended that you do set the OMPC_BIND_PROCS=true environment variable.