CSci4125 - Assignment 1: Theory

Revisions History:
Created January 5th, 2014

Readings

Please read and take some notes on the following.

 

Question 1:
Give brief answers to the following questions.

(a) What is the difference between a SIMD and MIMD parallel machine?

(b) What is the difference between a shared memory and distributed memory parallel machine? Give examples.

(c) Classify a standard quad-core processor according to the criteria in (a) and (b).

(d) Suppose the best sequential algorithm for a given problem requires n log n steps. What is the maximum number of steps allowed for a parallel algorithm for p processors in order to obtain optimum (linear) speedup? Why?

(e) What is a race condition? When and why do they occur?

Question 2.
(a) Give the equation for speedup. Be sure to define your terms.

(b) What is the difference between Speedup and Relative Speedup

(c) Consider the following experimental data captured by running the code Parallel_Blah on 1 million elements. Draw running time, speedup, and efficency graphs. Be sure to create prper graph titles, label all axis, and include the linear speedup curve.

P=

1

2

3

4

5

6

7

8

Time=

200

80

66

53

47

42

44

50

(d) With respect to the graphs from the previous question comment on:

Question 3:
A word w = xxx...x, where each x is a lower case character, is called a palindrome, if it reads the same when you read it either forwards or backwards (e.g. aasbsaa is a palindrome, but aasbb isn't).

(a) Design and write pseudo code for a shared memory parallel algorithm to determine whether a word w is a palindrome.

(b) Using Big-Oh analyze the runtime and expected speedup for your algorithm for the following two cases: 1) w is a palindrome, 2) w is not a palindrome.

(c) Discuss what you might expect to observe from a experimental analysis of an implementation of your algorithm.


Question 4.
Consider a processor cluster with p processors P[1], …, P[p] connected via a Ethernet switch. Each processor stores an array containing N/p numbers, for a total of N numbers over all processors.

(a) Write down a pseudo code for a parallel MPI algorithm that uses only MPI_AlltoAll for communication and determines for every processor P[i] the smallest number stored at processors P[1], …, P[i]. Your algorithm may assume that N/p > p (i.e. every processor can temporarily store O(p) data items) but should use no more than O(1) MPI_AlltoAll or MPI_AlltoAllv communication operations.

Example: p = 4, N/p = 3

Processor id: 1 2 3 4
Input: 8,3,7 6,2,4 5,10,11 7,1,9
Result: 3 2 2 1

(b) How many communication rounds does your algorithm require?

(c) How much local (sequential) computation does your algorithm require?

Note: No late assignments will be accepted.