ABSTRACT
Speculative parallelization divides a sequential program into possibly parallel tasks and permits these tasks to run in parallel if and only if they show no dependences with each other. The parallelization is safe in that a speculative execution always produces the same output as the sequential execution.
In this paper, we present the dependence hint, an interface for a user to specify possible dependences between possibly parallel tasks. Dependence hints may be incorrect or incomplete but they do not change the program output. The interface extends Cytron's do-across and recent OpenMP ordering primitives and makes them safe and safely composable. We use it to express conditional and partial parallelism and to parallelize large-size legacy code. The prototype system is implemented as a software library. It is used to improve performance by nearly 10 times on average on current multicore machines for 8 programs including 5 SPEC benchmarks.
- E. Allen, D. Chase, C. Flood, V. Luchangco, J. Maessen, S. Ryu, and G. L. Steele. Project fortress: a multicore language for multicore processors. Linux Magazine, pages 38--43, September 2007.Google Scholar
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann Publishers, Oct. 2001. Google ScholarDigital Library
- A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the Symposium on Operating Systems Design and Implementation, 2010. Google ScholarDigital Library
- A. Basumallik and R. Eigenmann. Optimizing irregular shared-memory applications for distributed-memory systems. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 119--128, 2006. Google ScholarDigital Library
- M. A. Bender, J. T. Fineman, S. Gilbert, and C. E. Leiserson. On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 133--144, Barcelona, Spain, 2004. Google ScholarDigital Library
- T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a compiler and runtime system for deterministic multithreaded execution. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 53--64, 2010. Google ScholarDigital Library
- E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe multithreaded programming for C/CGoogle Scholar
- . In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, 2009.Google Scholar
- S. Burckhardt, A. Baldassin, and D. Leijen. Concurrent programming with revisions and isolation types. In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, pages 691--707, 2010. Google ScholarDigital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, pages 519--538, 2005. Google ScholarDigital Library
- R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, pages 836--844, St. Charles, IL, Aug. 1986.Google Scholar
- C. Ding. Access annotation for safe speculative parallelization: Semantics and support. Technical Report URCS #966, Department of Computer Science, University of Rochester, March 2011.Google Scholar
- C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software behavior oriented parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 223--234, 2007. Google ScholarDigital Library
- M. Feng, R. Gupta, and Y. Hu. SpiceC: scalable parallelism via implicit copying and explicit commit. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 69--80, 2011. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, 1998. Google ScholarDigital Library
- L. Heyer, S. Kruglyak, and S. Yooseph. Exploring expression data: Identification and analysis of coexpressed genes. Genome Research, 9:1106--1115, 1999.Google ScholarCross Ref
- J. C. Jenista, Y. H. Eom, and B. Demsky. OoOJava: Software out-of-order execution. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 57--68, 2011. Google ScholarDigital Library
- Y. Jiang and X. Shen. Adaptive software speculation for enhancing the cost-efficiency of behavior-oriented parallelization. In Proceedings of the International Conference on Parallel Processing, pages 270--278, 2008. Google ScholarDigital Library
- L. Liu and Z. Li. Improving parallelism and locality with asynchronous algorithms. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 213--222, 2010. Google ScholarDigital Library
- V. Luchangco and V. J. Marathe. Transaction communicators: enabling cooperation among concurrent transactions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 169--178, 2011. Google ScholarDigital Library
- J. M. Mellor-Crummey. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of Supercomputing, pages 24--33, 1991. Google ScholarDigital Library
- OpenMP application program interface, version 3.0, May 2008. http://www.openmp.org/mp-documents/spec30.pdf.Google Scholar
- K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 12--25, 2011. Google ScholarDigital Library
- A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 65--76, 2010. Google ScholarDigital Library
- L. Rauchwerger and D. Padua. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarDigital Library
- M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of Jade. ACM Transactions on Programming Languages and Systems, 20(3):483--545, 1998. Google ScholarDigital Library
- J. A. Roback and G. R. Andrews. Gossamer: A lightweight approach to using multicore machines. In Proceedings of the International Conference on Parallel Processing, pages 30--39, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215--228, Atlanta, Georgia, May 1999. Google ScholarDigital Library
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 245--257, San Diego, CA, June 2003. Google ScholarDigital Library
- W. Thies, V. Chandrasekhar, and S. P. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in c programs. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture, pages 356--369, 2007. Google ScholarDigital Library
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or Discard execution model for speculative parallelization on multicores. In Proceedings of the ACM/IEEE International Symposium on Microarchitecture, pages 330--341, 2008. Google ScholarDigital Library
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: parallelizing sequential logging and replay. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pages 15--26, 2011. Google ScholarDigital Library
- C. von Praun, L. Ceze, and C. Cascaval. Implicit parallelism with ordered transactions. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Mar. 2007. Google ScholarDigital Library
- A. Welc, S. Jagannathan, and A. L. Hosking. Safe futures for Java. In Proceedings of the International Conference on Object Oriented Programming, Systems, Languages and Applications, pages 439--453, 2005. Google ScholarDigital Library
- D. Wonnacott. Achieving scalable locality with time skewing. International Journal of Parallel Programming, 30(3), June 2002. Google ScholarDigital Library
- A. Zhai, J. G. Steffan, C. B. Colohan, and T. C. Mowry. Compiler and hardware support for reducing the synchronization of speculative threads. ACM Transactions on Architecture and Code Optimization, 5(1):1--33, 2008. Google ScholarDigital Library
- C. Zhang, C. Ding, X. Gu, K. Kelsey, T. Bai, and X. F. 0002. Continuous speculative program parallelization in software. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 335--336, 2010. poster paper. Google ScholarDigital Library
Index Terms
- Safe parallel programming using dynamic dependence hints
Recommendations
Parallel programming by hints
SPLASH '11 Workshops: Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE! 2011, AOOPES'11, NEAT'11, & VMIL'11Sequential programs are often difficult to parallelize because of the complexity in their implementation and the uncertainty in their behavior. We will demonstrate behavior-oriented parallelization (BOP), which provides annotations for a user to mark ...
Safe parallel programming using dynamic dependence hints
OOPSLA '11Speculative parallelization divides a sequential program into possibly parallel tasks and permits these tasks to run in parallel if and only if they show no dependences with each other. The parallelization is safe in that a speculative execution always ...
Parallel programming by hints
OOPSLA '11: Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companionA sequential program is difficult to parallelize often because of the complexity in its implementation and the uncertainty in its behavior. Behavior-oriented parallelization (bop) provides annotations for a user to mark possibly parallel tasks and a ...
Comments