Proposal
Gregor
Gregor Samsa wakes in his bed and discovers he has transformed into a giant BUG.
Zhan Chen (zhanc1) and Lehao Sun (lehaos)
Summary
We are going to implement a parallel programming framework library for C (or C++). It would mimic the semantics of spawn
and sync
in Cilk, where the programmer simply identifies the independent parallel tasks and library would take care of the scheduling and work stealing.
Background
Cilk proposed an interesting parallel programming model which hide the thread execution policies from the programmers so that the programmer simply identified the parallelism and runtime will take care of implementation mappings[1]. Its most prominent feature, work stealing scheduler[2], was proved to enjoy some nice spatial and time properties. While it extended the syntax of C, we tend to add these features as a C(++) library. As a result, however, the programmers have to follow certain disciplines. We try to provide those syntactic and semantics sugars, like work stealing sheduling and performance guarantees in our library and make things as easy as possible.
Challenge
Continuation[3]: In order to distribute the jobs among workers, we have to abstract them in a managable way and be able to handle them under stack disciplines.
Thread Tree[4]: As in Cilk, each threaded procedure is implicitly synced just before return, which means we need to maintain a global thread “tree” data structure that potentially can be accessed by all the workers simultaneously, where each node represents a thread and each edge corresponds either a spawn
site or a sync
site. The manipulation of the tree must be scalable to achieve good performance.
Thread Scheduling and Work Stealing: With multiple continuations in hands, which threads should be picked to work on them and how should stealing be implemented in a way that can scale across many cores. Perhaps with the extra information about the characteristic of the job, we hope to come up with a more reasonable/workload aware sheduling.
Resources
We will start the project from scratch. The implementation will be tested on multi-core computers in different scale (personal computer, ghc cluster, latedays and hopefully an equipment with more cores).
Goals and deliverables
Basic goals
Our goal is to implement a parallel programming framework which supports spawn
and sync
semantics and work-stealing scheduler. The first implementation will use pthread library on Linux.
To be specific, the spawn
should create a new task to schedule, who can take arguments and whose return value can be stored to a designated location asynchrously and be used later, just like an assignment in Cilk with spawn
on the right hand side and a variable on the left hand side. The worker pool based architecture would enforce workload distribution by work stealing scheduling[5].
sync
will not resume until all the descendant tasks are completed. Before return from a spawned function, there will be an implicit sync
just as in Cilk.
We will test our implement on different applications on computers with different number of cores.
More we hope to achieve
If we have time, we will get rid of pthead library and implement our own version of thread abstraction. We try to implement more lightweight solution.
If everything goes well, we will introduce more advanced features into our thread library, such as inlet
and abort
.
Benchmarks
We will evaluate our framework by rewriting several applications originally in Cilk. Those applications are carefully selected from Cilk paper benchmarks in order to test the performance under different scenarios with the state-of-art implementation.
The applications are described below (the orginal implementation is from [6]):
fib(n)
Calculate the nth Fibonacci numberheat(n)
Heat-diffusion calculation on an m x n meshknapsack(n)
Solve the 0-1 knapsack problem on n itemslu(n)
LU-decomposition (without pivoting) of a dense n x n matrixmagic(n)
Computes the number of n x n magic squaresmatmul(n)
Rectangular matrix multiplicationplu(n)
Parallel LU Decomposition with Partial Pivoting
Schedule
Time | Plan | Check |
---|---|---|
April 1-April 3 | Literature review on work stealing and Cilk | |
April 4-April 10 | A prototype library that allows spawning no-argument tasks and sync tasks | |
April 11-April 17 | A prototype with spawn , sync semantics similar in Cilk and able to set return value asynchronously + first benchmark test |
|
April 18-April 24 | An implementation of work stealing protocol in a potentially non-scalable way + second benchmark test | |
April 25-May 1 | Scale work stealing based on second benchmark test | |
May 2-May 9 | Toward a self-implementated thread and lock |
Reference
Blumofe, Robert D., et al. “Cilk: An efficient multithreaded runtime system.” Journal of parallel and distributed computing 37.1 (1996): 55-69. ↩
Blumofe, Robert D., and Charles E. Leiserson. “Scheduling multithreaded computations by work stealing.” Journal of the ACM (JACM) 46.5 (1999): 720-748. ↩
Frigo, Matteo, Charles E. Leiserson, and Keith H. Randall. “The implementation of the Cilk-5 multithreaded language.” ACM Sigplan Notices. Vol. 33. No. 5. ACM, 1998. ↩
Joerg, Christopher F. The cilk system for parallel multithreaded computing. Diss. Massachusetts Institute of Technology, 1996. ↩
Blumofe, Robert D. Executing multithreaded programs efficiently. Diss. Massachusetts Institute of Technology, 1995. ↩
Kielmann, T., R. Nieuwpoort, and H. Bal. “Cilk-5.3 Reference Manual.” Supercomputing Technologies Group (2000). ↩