MallocAnswer's BLOG


  • Home

  • Proposal

  • Checkpoint

  • Report

New Project Webpage

Proposal

Gregor

Gregor Samsa wakes in his bed and discovers he has transformed into a giant BUG.

Zhan Chen (zhanc1) and Lehao Sun (lehaos)

Summary

We are going to implement a parallel programming framework library for C (or C++). It would mimic the semantics of spawn and sync in Cilk, where the programmer simply identifies the independent parallel tasks and library would take care of the scheduling and work stealing.

Background

Cilk proposed an interesting parallel programming model which hide the thread execution policies from the programmers so that the programmer simply identified the parallelism and runtime will take care of implementation mappings[1]. Its most prominent feature, work stealing scheduler[2], was proved to enjoy some nice spatial and time properties. While it extended the syntax of C, we tend to add these features as a C(++) library. As a result, however, the programmers have to follow certain disciplines. We try to provide those syntactic and semantics sugars, like work stealing sheduling and performance guarantees in our library and make things as easy as possible.

Challenge

Continuation[3]: In order to distribute the jobs among workers, we have to abstract them in a managable way and be able to handle them under stack disciplines.

Thread Tree[4]: As in Cilk, each threaded procedure is implicitly synced just before return, which means we need to maintain a global thread “tree” data structure that potentially can be accessed by all the workers simultaneously, where each node represents a thread and each edge corresponds either a spawn site or a sync site. The manipulation of the tree must be scalable to achieve good performance.

Thread Scheduling and Work Stealing: With multiple continuations in hands, which threads should be picked to work on them and how should stealing be implemented in a way that can scale across many cores. Perhaps with the extra information about the characteristic of the job, we hope to come up with a more reasonable/workload aware sheduling.

Resources

We will start the project from scratch. The implementation will be tested on multi-core computers in different scale (personal computer, ghc cluster, latedays and hopefully an equipment with more cores).

Goals and deliverables

Basic goals

Our goal is to implement a parallel programming framework which supports spawn and sync semantics and work-stealing scheduler. The first implementation will use pthread library on Linux.

To be specific, the spawn should create a new task to schedule, who can take arguments and whose return value can be stored to a designated location asynchrously and be used later, just like an assignment in Cilk with spawn on the right hand side and a variable on the left hand side. The worker pool based architecture would enforce workload distribution by work stealing scheduling[5].

sync will not resume until all the descendant tasks are completed. Before return from a spawned function, there will be an implicit sync just as in Cilk.

We will test our implement on different applications on computers with different number of cores.

More we hope to achieve

If we have time, we will get rid of pthead library and implement our own version of thread abstraction. We try to implement more lightweight solution.

If everything goes well, we will introduce more advanced features into our thread library, such as inlet and abort.

Benchmarks

We will evaluate our framework by rewriting several applications originally in Cilk. Those applications are carefully selected from Cilk paper benchmarks in order to test the performance under different scenarios with the state-of-art implementation.
The applications are described below (the orginal implementation is from [6]):

  • fib(n) Calculate the nth Fibonacci number
  • heat(n) Heat-diffusion calculation on an m􏰾 x n mesh
  • knapsack(n) Solve the 0-1 knapsack problem on n items
  • lu(n) LU-decomposition (without pivoting) of a dense n x n matrix
  • magic(n) Computes the number of n x n magic squares
  • matmul(n) Rectangular matrix multiplication
  • plu(n) Parallel LU Decomposition with Partial Pivoting

Schedule

Time Plan Check
April 1-April 3 Literature review on work stealing and Cilk
April 4-April 10 A prototype library that allows spawning no-argument tasks and sync tasks
April 11-April 17 A prototype with spawn, sync semantics similar in Cilk and able to set return value asynchronously + first benchmark test
April 18-April 24 An implementation of work stealing protocol in a potentially non-scalable way + second benchmark test
April 25-May 1 Scale work stealing based on second benchmark test
May 2-May 9 Toward a self-implementated thread and lock

Reference


  1. Blumofe, Robert D., et al. “Cilk: An efficient multithreaded runtime system.” Journal of parallel and distributed computing 37.1 (1996): 55-69. ↩

  2. Blumofe, Robert D., and Charles E. Leiserson. “Scheduling multithreaded computations by work stealing.” Journal of the ACM (JACM) 46.5 (1999): 720-748. ↩

  3. Frigo, Matteo, Charles E. Leiserson, and Keith H. Randall. “The implementation of the Cilk-5 multithreaded language.” ACM Sigplan Notices. Vol. 33. No. 5. ACM, 1998. ↩

  4. Joerg, Christopher F. The cilk system for parallel multithreaded computing. Diss. Massachusetts Institute of Technology, 1996. ↩

  5. Blumofe, Robert D. Executing multithreaded programs efficiently. Diss. Massachusetts Institute of Technology, 1995. ↩

  6. Kielmann, T., R. Nieuwpoort, and H. Bal. “Cilk-5.3 Reference Manual.” Supercomputing Technologies Group (2000). ↩

MallocAnswer

MallocAnswer

My Heart Is In The Work

1 posts
Github Linkedin Facebook
© 2016 MallocAnswer
Powered by Hexo
Theme - NexT.Mist