skip to main content
10.1145/1993498.1993555acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

ALTER: exploiting breakable dependences for parallelization

Published:04 June 2011Publication History

ABSTRACT

For decades, compilers have relied on dependence analysis to determine the legality of their transformations. While this conservative approach has enabled many robust optimizations, when it comes to parallelization there are many opportunities that can only be exploited by changing or re-ordering the dependences in the program.

This paper presents Alter: a system for identifying and enforcing parallelism that violates certain dependences while preserving overall program functionality. Based on programmer annotations, Alter exploits new parallelism in loops by reordering iterations or allowing stale reads. Alter can also infer which annotations are likely to benefit the program by using a test-driven framework.

Our evaluation of Alter demonstrates that it uncovers parallelism that is beyond the reach of existing static and dynamic tools. Across a selection of 12 performance-intensive loops, 9 of which have loop-carried dependences, Alter obtains an average speedup of 2.0x on 4 cores.

References

  1. The Parallel Dwarfs Project. http://paralleldwarfs.codeplex.com.Google ScholarGoogle Scholar
  2. F. Aleen and N. Clark. Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. An, A. Jula, S. Rus, S. Saunders, T. Smith, G. Tanase, N. Thomas, N. Amato, and L. Rauchwerger. STAPL: An Adaptive, Generic Parallel C++ Library. LNCS, 2624:195--210, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. CACM, 52(10), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Berger, K. McKinley, R. Blumofe, and P. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In ASPLOS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe Multithreaded Programming for C/C++. In OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Burckhardt, A. Baldassion, and D. Leijen. Concurrent Programming with Revisions and Isolation Types. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Burke and R. Cytron. Interprocedural dependence analysis and parallelization. SIGPLAN Notices, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Chen, F. Min, V. Nagarajan, and R. Gupta. Copy or Discard Execution Model for Speculative Parallelization on Multicores. In MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser Boston, 1st edition, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Datta. Auto-tuning Stencil Codes for Cache-Based Multicore Platforms. PhD thesis, UC Berkeley, December 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Dias, J. Seco, and J. Lourenco. Snapshot isolation anomalies detection in software transactional memory. In INForum, 2010.Google ScholarGoogle Scholar
  15. C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software Behavior Oriented Parallelization. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Guerraoui and M. Kapalka. On the Correctness of Transactional Memory. In PPoPP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. Liao, E. Bugnion, and M. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 29:84--89, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-free Data Structures. In ISCA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Hwu, S. Ryoo, S. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel. Implicitly parallel programming models for thousand-core microprocessors. In DAC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Kennedy and J. Allen. Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Trans. on Computers, 48(9), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. Lonestar: A Suite of Parallel Irregular Programs. In ISPASS, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  23. M. Kulkarni, K. Pingali, G. Ramanarayanan, B. Walter, K. Bala, and L. P. Chew. Optimistic Parallelism Benefits from Data Partitioning. In ASPLOS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic Parallelism Requires Abstractions. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Larus and R. Rajwar. Transactional Memory (Synthesis Lectures on Computer Architecture). Morgan & Claypool Publishers, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Maydan, S. Amarasinghe, and M. Lam. Array data-flow analysis and its use in array privatization. In POPL, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Mehrara, J. Hao, P. Hsu, and S. Mahlke. Parallelizing Sequential applications on Commodity Hardware using a Low-cost Software Transactional Memory. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IISWC, 2008.Google ScholarGoogle Scholar
  29. S. Misailovic, D. Kim, and M. Rinard. Parallelizing Sequential Programs With Statistical Accuracy Tests. Technical Report MIT-CSAIL-TR-2010-038, MIT, Aug 2010.Google ScholarGoogle Scholar
  30. S. Moon and M. W. Hall. Evaluation of Predicated Array Data-flow Analysis for Automatic Parallelization. In PPoPP, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.Google ScholarGoogle Scholar
  32. W. Pugh and D. Wonnacott. Constraint-based array dependence analysis. ACM TOPLAS., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Raman, N. Vachharajani, R. Rangan, and D. August. SPICE: Speculative Parallel Iteration Chunk Execution. In CGO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Rauchwerger and D. Padua. The Privatizing DOALL Test: A Run-Time Technique for DOALL Loop Identification and Array Privatization. In ICS, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Rauchwerger and D. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In PLDI, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Reps. Undecidability of context-sensitive data-independence analysis. ACM TOPLAS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Riegel, C. Fetzer, and P. Felber. Snapshot isolation for software transactional memory. In TRANSACT, 2006.Google ScholarGoogle Scholar
  38. M. Rinard and P. Diniz. Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers. ACM TOPLAS, 19(6), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. Sohi, S. Breach, and T. N. Vijaykumar. Multiscalar Processors. In ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Tarjan. A Unified Approach to Path Problems. J. ACM, 28(3), 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Thies, V. Chandrasekhar, and S. Amarasinghe. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Tian, M. Feng, and R. Gupta. Supporting Speculative Parallelization in the Presence of Dynamic Data Structures. In PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. G. Tournavitis, Z. Wang, B. Franke, and M. O'Boyle. Towards a Holistic Approach to Auto-parallelization: Integrating Profile-driven Parallelism Detection and Machine-learning Based Mapping. PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. N. Vachharajani, R. Rangan, E. Raman, M. Bridges, G. Ottoni, and D. August. Speculative Decoupled Software Pipelining. In PACT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. H. Vandierendonck, S. Rul, and K. Bosschere. The Paralax infrastructure: automatic parallelization with a helping hand. In PACT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ALTER: exploiting breakable dependences for parallelization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2011
        668 pages
        ISBN:9781450306638
        DOI:10.1145/1993498
        • General Chair:
        • Mary Hall,
        • Program Chair:
        • David Padua
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 46, Issue 6
          PLDI '11
          June 2011
          652 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1993316
          Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 June 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate406of2,067submissions,20%

        Upcoming Conference

        PLDI '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader