Note: The content of this document was largely extracted from the Cilk Manual and adapted to the local Dalhousie environment. For more information about Cilk see the CILK Short Introduction and the Cilk Website.
Cilk is a language for multithreaded parallel programming based on ANSI C. Cilk is designed for general-purpose parallel programming, but it is especially effective for exploiting dynamic, highly asynchronous parallelism, which can be difficult to write in data-parallel or message-passing style. Cilk was originally developed by the Supercomputing Technologies Group at the MIT Laboratory for Computer Science under the supervision of Prof. Charles E. Leiserson.
Cilk is designed for Symmetric Multi-Processors (SMP) like Locutus and Borg. For example, Locutus is a Sun Enterprise 4500 with 3 GB RAM and 8 processors running Solaris 7. An ideal machine to try out Cilk programs with! NOTE: You must use ssh to log into Locutus as this machine does not respond to Telnet requests!
Cilk uses a number of specialized commands which are not in most peoples' path. The FCS installation of CILK is in /opt/cilk, so you should add these lines to you .bashrc.
if [ -z "$SYSPATH" ] then PATH="$PATH:$HOME/bin:.:/opt/cilk/bin" else PATH="$HOME/bin:$SYSPATH:.:/opt/cilk/bin" fi export PATH export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cilk/lib
In order to compile Cilk programs, Cilk installs the cilk command, which is a special version of the gcc compiler. The cilk command understands that files with the .cilk extension are Cilk programs and acts accordingly. cilk accepts in general the same arguments as the gcc compiler (with the exception of -ansi) For example, assume that the source code for a Fibonacci program is in a file called fib.cilk
This program can be obtained as follows:
Locutus% wget www.cs.dal.ca/~arc/resources/Cilk/fib.cilk
To compile this program one might type:
Locutus% cilk -O2 fib.cilk -o fib
to produce the fib executable. You can also use your favorite gcc options, such as -g, -Wall, and so on.
To run the program you might type:
Locutus% fib --nproc 4 30
this starts fib on 4 processors to compute the 30th Fibonacci number. At the end of the execution, you should see a printout similar to the following:
During program development, it is useful to collect performance data of a Cilk program. The Cilk runtime system collects this information when a program is compiled with the args -cilk-profile and -cilk-critical-path.
Locutus% cilk -cilk-profile -cilk-critical-path -O2 fib.cilk -o fib
The args -cilk-profile instructs Cilk to collect data about how much time each processor spends working, how many thread migrations occur, how much memory is allocated, etc. The ag -cilk-critical-path enables measurement of the critical path. (The critical path is the ideal execution time of your program on an in infinite number of processors. For a precise definition of the critical path, see Section 2.8. of the Cilk manual) We distinguish the -cilk-critical-path option from -cilk-profile because critical-path measurement involves many calls to the timing routines, and it may slow down programs significantly. In contrast, -cilk-profile still provides useful profiling information without too much overhead. Cilk program compiled with profiling support can be instructed to print performance information by using the --stats option. The command line
RUNTIME SYSTEM STATISTICS:
Locutus% fib --nproc 4 --stats 1 30
yields an output similar to the following:
After the output of the program itself, the runtime system reports
several useful statistics collected during execution. Additional
statistics can be obtained by setting the statistics level higher
than 1. The parallelism, which is the ratio of total work to
critical-path length, is also printed so that you can gauge the
scalability of your application. If you run an application on many
fewer processors than the parallelism, then you should expect
linear speedup. The Cilk runtime system accepts the following
standard options, which must be specified first on a Cilk
program's command line:
--help List available options.
--nproc n Use n processors in the computation. If n = 0, use as many processors as are available. The parameter n must be at least 0 and be no greater than the machine size. The default is to use 1 processor.
--stats l Set statistic level to l. The higher the level, the more detailed the information. The default level is 0, which disables any printing of statistics. Level 1 prints the wall-clock running time, the total work, and the critical-path length. The exact set of statistics printed depends on the ags with which the runtime system and the application were compiled. The highest meaningful statistics level is currently 6.
--no-stats Equivalent to -stats 0.
--stack size Set a maximum frame stack size of size entries. The frame stack size limits the depth of recursion allowed in the program. The default depth is 32768.
-- Force Cilk option parsing to stop. Subsequent options will be passed to the Cilk program unaltered.
These options must be specified before all user options, since they are processed by the Cilk runtime system and stripped away before the rest of the user options are passed to main.
Locutus% wget www.cs.dal.ca/~arc/resources/Cilk/fib.cilk Locutus% cilk -cilk-profile -cilk-critical-path -O2 fib.cilk -o fib Locutus% fib --nproc 4 --stats 1 30 Result: 832040 RUNTIME SYSTEM STATISTICS: Wall-clock running time on 4 processors: 2.593270 s Total work = 10.341069 s Total work (accumulated) = 7.746886 s Critical path = 779.588000 us Parallelism = 9937.154003