Assignment 3

Kakro, also known as cross sums, is a recent puzzle craze. You can find Kakro puzzles in both the Guardian and the National Post newspapers. These puzzles look a bit  like "math based" crossword puzzle. The numbers, that appear in the black margins, represent the sums of the digits that you will insert into the empty squares. A number above a diagonal line refers to the empty squares to the right of that number, and a number below a diagonal line refers to the squares below that number. No zeros appear in the puzzle, and no digit is repeated in a particular number. An example taken from the Dell Magazine Cross Sum page is given below.

a) Create a sequential C++ program that reads a Kakro puzzle from a text file and using exhustive recursive enumeration finds the unique solution. The key recursive routine might look something like this:

     solve(int location)
       If location == gridSize // puzzle has been solved
          display();
          return true;
       else
          Check each integer v between 1 and 9
              if v is OK in location //meaning that row, col and no duplicates conditions are met
                 set location to v
                 return solve(nextEmptyLocation(l))
              else return false

If you are unsure of how to go about designing a recursive enumeration you might want to look over this sample java Maze_Search program. You might be interested in this sample input formats and a matching output display formats.

b) Now design and implement a parallel shared memory version of your Kakro puzzle solver using Cilk. Measure the runtime of your program for several different sized grids. Use the cilkview program to determine the parallelism for the same grids. Create a sequential version of your program by removing all "spawn" and "sync" statements. Leave the time measurment code unchanged and compile using the Cilk compiler. Perform the same test runs as above and measure the sequential times. Compare the sequential times with the parallel times and relate them to the number of processor cores available. How good was your speedup? Hand in your documented code and a readme file how to run it. Hand in the transcripts of your test runs (copy paste from terminal into a text file). Discuss in your documentation the results you expected and explain the reasons for any non-ideal behavior you observe. In particular, if your program does not achieve perfect speedup, explain why. Is it due to work imbalance? Communication/synchronization overhead? Performing redundant (or unnecessary) work? Is it possible to achieve better than perfect speedup? Provide measurements to back up your explanations.