Cilk Parallel Computing
Using Cilk at Dalhousie

 

1. Introduction

Note: The content of this document was largely extracted from the Cilk Manual and adapted to the local Dalhousie environment. For more information about Cilk see the CILK Short Introduction and the Cilk Website.

Cilk is a language for multithreaded parallel programming based on ANSI C. Cilk is designed for general-purpose parallel programming, but it is especially effective for exploiting dynamic, highly asynchronous parallelism, which can be difficult to write in data-parallel or message-passing style. Cilk was originally developed by the Supercomputing Technologies Group at the MIT Laboratory for Computer Science under the supervision of Prof. Charles E. Leiserson.

Cilk is designed for Symmetric Multi-Processors (SMP) like Locutus and Borg.  For example, Locutus is a Sun Enterprise 4500 with 3 GB RAM and 8 processors running Solaris 7.  An ideal machine to try out Cilk programs with! NOTE: You must use ssh to log into Locutus as this machine does not respond to Telnet requests!

2. Preliminary Setup

3. How to Compile and run your Cilk program

In order to compile Cilk programs, Cilk installs the cilk command, which is a special version of the gcc compiler. The cilk command understands that files with the .cilk extension are Cilk programs and acts accordingly. cilk accepts in general the same arguments as the gcc compiler (with the exception of -ansi) For example, assume that the source code for a Fibonacci program is in a file called fib.cilk

This program can be obtained as follows:

Locutus% wget www.cs.dal.ca/~arc/resources/Cilk/fib.cilk

To compile this program one might type:

Locutus% cilk -O2 fib.cilk -o fib

to produce the fib executable. You can also use your favorite gcc options, such as -g, -Wall, and so on.

To run the program you might type:

Locutus% fib --nproc 4 30

this starts fib on 4 processors to compute the 30th Fibonacci number. At the end of the execution, you should see a printout similar to the following:

Result: 832040

4. How to Profile your Cilk program

During program development, it is useful to collect performance data of a Cilk program. The Cilk runtime system collects this information when a program is compiled with the args -cilk-profile and -cilk-critical-path.

Locutus% cilk -cilk-profile -cilk-critical-path -O2 fib.cilk -o fib

The args -cilk-profile instructs Cilk to collect data about how much time each processor spends working, how many thread migrations occur, how much memory is allocated, etc. The ag -cilk-critical-path enables measurement of the critical path. (The critical path is the ideal execution time of your program on an in infinite number of processors. For a precise definition of the critical path, see Section 2.8. of the Cilk manual) We distinguish the -cilk-critical-path option from -cilk-profile because critical-path measurement involves many calls to the timing routines, and it may slow down programs significantly. In contrast, -cilk-profile still provides useful profiling information without too much overhead. Cilk program compiled with profiling support can be instructed to print performance information by using the --stats option. The command line

Locutus% fib --nproc 4 --stats 1 30

yields an output similar to the following:

Result: 832040
RUNTIME SYSTEM STATISTICS:
Wall-clock running time on 4 processors: 2.593270 s
Total work = 10.341069 s
Total work (accumulated) = 7.746886 s
Critical path = 779.588000 us
Parallelism = 9937.154003

After the output of the program itself, the runtime system reports several useful statistics collected during execution. Additional statistics can be obtained by setting the statistics level higher than 1. The parallelism, which is the ratio of total work to critical-path length, is also printed so that you can gauge the scalability of your application. If you run an application on many fewer processors than the parallelism, then you should expect linear speedup. The Cilk runtime system accepts the following standard options, which must be specified first on a Cilk program's command line:

--help List available options.

--nproc n Use n processors in the computation. If n = 0, use as many processors as are available. The parameter n must be at least 0 and be no greater than the machine size. The default is to use 1 processor.

--stats l Set statistic level to l. The higher the level, the more detailed the information. The default level is 0, which disables any printing of statistics. Level 1 prints the wall-clock running time, the total work, and the critical-path length. The exact set of statistics printed depends on the ags with which the runtime system and the application were compiled. The highest meaningful statistics level is currently 6.

--no-stats Equivalent to -stats 0.

--stack size Set a maximum frame stack size of size entries. The frame stack size limits the depth of recursion allowed in the program. The default depth is 32768.

-- Force
Cilk option parsing to stop. Subsequent options will be passed to the Cilk program unaltered.

These options must be specified before all user options, since they are processed by the Cilk runtime system and stripped away before the rest of the user options are passed to main.

5. An Example Session

	Locutus% wget www.cs.dal.ca/~arc/resources/Cilk/fib.cilk
	Locutus% cilk -cilk-profile -cilk-critical-path -O2 fib.cilk -o fib
	Locutus% fib --nproc 4 --stats 1 30
	Result: 832040
	RUNTIME SYSTEM STATISTICS:
	Wall-clock running time on 4 processors: 2.593270 s
	Total work = 10.341069 s
	Total work (accumulated) = 7.746886 s
	Critical path = 779.588000 us
	Parallelism = 9937.154003