Gregor Samsa wakes in his bed and discovers he has transformed into a giant BUG.
We are going to implement a parallel programming framework library for C (or C++). It would mimic the semantics of
sync in Cilk, where the programmer simply identifies the independent parallel tasks and library would take care of the scheduling and work stealing.
Cilk proposed an interesting parallel programming model which hide the thread execution policies from the programmers so that the programmer simply identified the parallelism and runtime will take care of implementation mappings. Its most prominent feature, work stealing scheduler, was proved to enjoy some nice spatial and time properties. While it extended the syntax of C, we tend to add these features as a C(++) library. As a result, however, the programmers have to follow certain disciplines. We try to provide those syntactic and semantics sugars, like work stealing sheduling and performance guarantees in our library and make things as easy as possible.
Continuation: In order to distribute the jobs among workers, we have to abstract them in a managable way and be able to handle them under stack disciplines.
Thread Tree: As in Cilk, each threaded procedure is implicitly synced just before return, which means we need to maintain a global thread “tree” data structure that potentially can be accessed by all the workers simultaneously, where each node represents a thread and each edge corresponds either a
spawn site or a
sync site. The manipulation of the tree must be scalable to achieve good performance.
Thread Scheduling and Work Stealing: With multiple continuations in hands, which threads should be picked to work on them and how should stealing be implemented in a way that can scale across many cores. Perhaps with the extra information about the characteristic of the job, we hope to come up with a more reasonable/workload aware sheduling.
We will start the project from scratch. The implementation will be tested on multi-core computers in different scale (personal computer, ghc cluster, latedays and hopefully an equipment with more cores).
Goals and deliverables
Our goal is to implement a parallel programming framework which supports
sync semantics and work-stealing scheduler. The first implementation will use pthread library on Linux.
To be specific, the
spawn should create a new task to schedule, who can take arguments and whose return value can be stored to a designated location asynchrously and be used later, just like an assignment in Cilk with
spawn on the right hand side and a variable on the left hand side. The worker pool based architecture would enforce workload distribution by work stealing scheduling.
sync will not resume until all the descendant tasks are completed. Before return from a spawned function, there will be an implicit
sync just as in Cilk.
We will test our implement on different applications on computers with different number of cores.
More we hope to achieve
If we have time, we will get rid of pthead library and implement our own version of thread abstraction. We try to implement more lightweight solution.
If everything goes well, we will introduce more advanced features into our thread library, such as
We will evaluate our framework by rewriting several applications originally in Cilk. Those applications are carefully selected from Cilk paper benchmarks in order to test the performance under different scenarios with the state-of-art implementation.
The applications are described below (the orginal implementation is from ):
fib(n)Calculate the nth Fibonacci number
heat(n)Heat-diffusion calculation on an m x n mesh
knapsack(n)Solve the 0-1 knapsack problem on n items
lu(n)LU-decomposition (without pivoting) of a dense n x n matrix
magic(n)Computes the number of n x n magic squares
matmul(n)Rectangular matrix multiplication
plu(n)Parallel LU Decomposition with Partial Pivoting
|April 1-April 3||Literature review on work stealing and Cilk|
|April 4-April 10||A prototype library that allows spawning no-argument tasks and sync tasks|
|April 11-April 17||A prototype with
|April 18-April 24||An implementation of work stealing protocol in a potentially non-scalable way + second benchmark test|
|April 25-May 1||Scale work stealing based on second benchmark test|
|May 2-May 9||Toward a self-implementated thread and lock|
Blumofe, Robert D., et al. “Cilk: An efficient multithreaded runtime system.” Journal of parallel and distributed computing 37.1 (1996): 55-69. ↩
Blumofe, Robert D., and Charles E. Leiserson. “Scheduling multithreaded computations by work stealing.” Journal of the ACM (JACM) 46.5 (1999): 720-748. ↩
Frigo, Matteo, Charles E. Leiserson, and Keith H. Randall. “The implementation of the Cilk-5 multithreaded language.” ACM Sigplan Notices. Vol. 33. No. 5. ACM, 1998. ↩
Joerg, Christopher F. The cilk system for parallel multithreaded computing. Diss. Massachusetts Institute of Technology, 1996. ↩
Blumofe, Robert D. Executing multithreaded programs efficiently. Diss. Massachusetts Institute of Technology, 1995. ↩
Kielmann, T., R. Nieuwpoort, and H. Bal. “Cilk-5.3 Reference Manual.” Supercomputing Technologies Group (2000). ↩