skip to main content
10.1145/1926385.1926457acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article

A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

Published: 26 January 2011 Publication History

Abstract

A large body of data-flow analyses exists for analyzing and optimizing sequential code. Unfortunately, much of it cannot be directly applied on parallel code, for reasons of correctness. This paper presents a technique to automatically, aggressively, yet safely apply sequentially-sound data-flow transformations, without change, on shared-memory programs. The technique is founded on the notion of program references being "siloed" on certain control-flow paths. Intuitively, siloed references are free of interference from other threads within the confines of such paths. Data-flow transformations can, in general, be unblocked on siloed references.
The solution has been implemented in a widely used compiler. Results on benchmarks from SPLASH-2 show that performance improvements of up to 41% are possible, with an average improvement of 6% across all the tested programs over all thread counts.

Supplementary Material

MP4 File (56-mpeg-4.mp4)

References

[1]
Adve, S. V., and Gharachorloo, K. Shared Memory Consistency Models: A Tutorial. IEEE Computer 29, 12 (Dec. 1996), 66--76.
[2]
Adve, S. V., and Hill, M. D. Weak Ordering--A New Definition. In Proc. International Symposium on Computer Architecture (May 1990), pp. 2--14.
[3]
Boehm, H.-J., and Adve, S. V. Foundations of the C++ Concurrency Memory Model. In Proc. Conference on Programming Language Design and Implementation (June 2008), pp. 68--78
[4]
Bristow, G., Drey, C., Edwards, B., and Riddle, W. Anoma­ly Detection in Concurrent Programs. In Proc. International Conference on Software Engineering (Sept. 1979), pp. 265--273.
[5]
Callahan, D., and Subhlok, J. Static Analysis of Low-level Synchronization. In Proc. ACM Workshop on Parallel and Distributed Debugging (May 1988), pp. 100--111.
[6]
Choi, J.-D., Gupta, M., Sreedhar, V. C., and Midkiff, S. P. Escape Analysis for Java. In Proc. Conference on Object-Oriented Programming, Systems, Languages and Applications (Nov. 1999), pp. 1--19.
[7]
Chow, F., Chan, S., Liu, S.-M., Lo, R., and Streich, M. Effective Representation of Aliases and Indirect Memory Operations in SSA Form. In Proc. International Conference on Compiler Construction (Apr. 1996), vol. 1060 of Lecture Notes in Computer Science, Springer, pp. 253--267.
[8]
Duesterwald, E., and Soffa, M. L. Concurrency Analysis in the Presence of Procedures Using a Data-Flow Framework. In Proc. Symposium on Testing, Analysis and Verification (Oct. 1991), pp. 36--48.
[9]
GCC 4.4 Release Series--Changes, New Features, and Fixes. At http://gcc.gnu.org/gcc-4.4/changes.html.
[10]
Heffner, K., Tarditi, D., and Smith, M. D. Extending Object-Oriented Optimizations for Concurrent Programs. In Proc. International Conference on Parallel Architectures and Compilation Techniques (Sept. 2007), pp. 119--129.
[11]
Hendren, L. J., and Nicolau, A. Parallelizing Programs with Recursive Data Structures. IEEE Transactions on Parallel and Distributed Systems 1, 1 (Jan. 1990), 35--47.
[12]
Huang, L., Sethuraman, G., and Chapman, B. Parallel Data Flow Analysis for OpenMP Programs. In Proc. International Workshop on OpenMP (June 2007), vol. 4935 of Lecture Notes in Computer Science, Springer, pp. 138--142.
[13]
The IEEE and The Open Group. IEEE Standard 1003.1, 2004.
[14]
C Standard ISO/IEC 9899. At http://www.open-std.org/JTC1/.
[15]
Joisha, P. G., Schreiber, R. S., Banerjee, P., Boehm, H.-J., and Chakrabarti, D. R. A Technique for the Effective and Automatic Reuse of Classical Compiler Optimizations on Multithreaded Code. Technical Report HPL-2010-81R1, Hewlett-Packard Laboratories, July 2010.
[16]
Kam, J. B., and Ullman, J. D. Monotone Data Flow Analysis Frameworks. Acta Informatica 7, 3 (Sept. 1977), 305--317.
[17]
Knoop, J., and Steffen, B. Parallelism for Free: Efficient and Optimal Bitvector Analyses for Parallel Programs. ACM Transactions on Programming Languages and Systems 18, 3 (May 1996), 268--299.
[18]
Lamport, L. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers C-28, 9 (Sept. 1979), 690--691.
[19]
Lee, J., Midkiff, S. P., and Padua, D. A. Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Aug. 1997), vol. 1366 of Lecture Notes in Computer Science, Springer, pp. 114--130.
[20]
Li, L., and Verbrugge, C. A Practical MHP Information Analysis for Concurrent Java Programs. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Sept. 2004), vol. 3602 of Lecture Notes in Computer Science, Springer, pp. 194--208.
[21]
Masticola, S. P., and Ryder, B. G. Non-concurrency Analysis. In Proc. Symposium on Principles and Practices of Parallel Programming (May 1993), pp. 129--138.
[22]
Midkiff, S. P., and Padua, D. A. Issues in the Optimization of Parallel Programs. In Proc. International Conference on Parallel Processing (Aug. 1990), vol. II, The Pennsylvania State University Press, pp. 105--113.
[23]
Naumovich, G., and Avrunin, G. S. A Conservative Data Flow Algorithm for Detecting All Pairs of Statements that May Happen in Parallel. In Proc. Symposium on Foundations of Software Engineering (Nov. 1998), pp. 24--34.
[24]
Naumovich, G., Avrunin, G. S., and Clarke, L. A. An Efficient Algorithm for Computing MHP Information for Concurrent Java Programs. In Proc. Symposium on Foundations of Software Engineering (Sept. 1999), pp. 338--354.
[25]
Novillo, D. Memory SSA-A Unified Approach for Sparsely Representing Memory Operations. In Proc. GCC Developers' Summit (July 2007), pp. 97--110.
[26]
Novillo, D., Unrau, R., and Schaeffer, J. Concurrent SSA Form in the Presence of Mutual Exclusion. In Proc. International Conference on Parallel Processing (Aug. 1998), IEEE Computer Society Press, pp. 356--364.
[27]
OpenMP Architecture Review Board. OpenMP Application Program Interface, version 3.0 ed., May 2008.
[28]
Rodríguez, E., Dwyer, M., Flanagan, C., Hatcliff, J., Leavens, G. T., and Robby. Extending JML for Modular Specification and Verification of Multi-threaded Programs. In Proc. European Conference on Object-Oriented Programming (July 2005), vol. 3586 of Lecture Notes in Computer Science, Springer, pp. 551--576.
[29]
Ruf, E. Effective Synchronization Removal for Java. In Proc. Conference on Programming Language Design and Implementation (June 2000), pp. 208--218.
[30]
Rugina, R., and Rinard, M. C. Pointer Analysis for Structured Parallel Programs. ACM Transactions on Programming Languages and Systems 25, 1 (Jan. 2003), 70--116.
[31]
Sarkar, V. Analysis and Optimization of Explicitly Parallel Programs Using the Parallel Program Graph Representation. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Aug. 1997), vol. 1366 of Lecture Notes in Computer Science, Springer, pp. 94--113.
[32]
Satoh, S., Kusano, K., and Sato, M. Compiler Optimization Techniques for OpenMP Programs. Scientific Programming 9, 2/3 (Aug. 2001), 131--142.
[33]
Ševčík, J. Program Transformations in Weak Memory Models. PhD thesis, University of Edinburgh, 2008.
[34]
Shasha, D., and Snir, M. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Transactions on Programming Languages and Systems 10, 2 (Apr. 1988), 282--312.
[35]
Srinivasan, H., Hook, J., and Wolfe, M. Static Single Assignment for Explicitly Parallel Programs. In Proc. Symposium on Principles of Programming Languages (Jan. 1993), pp. 260--272.
[36]
Sura, Z., Fang, X., Wong, C.-L., Midkiff, S. P., Lee, J., and Padua, D. A. Compiler Techniques for High Performance Sequentially Consistent Java Programs. In Proc. Symposium on Principles and Practices of Parallel Programming (June 2005), pp. 2--13.
[37]
Taylor, R. N. A General-Purpose Algorithm for Analyzing Concurrent Programs. Communications of the ACM 26, 5 (May 1983), 362--376.
[38]
Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., and Su, E. Intel OpenMP C++/Fortran Compiler for Hyper-Threading Technology: Implementation and Performance. Intel Technology Journal 6, 1 (Feb. 2002), 36--46
[39]
von Praun, C., and Gross, T. R. Static Conflict Analysis for Multi-Threaded Object-Oriented Programs. In Proc. Conference on Programming Language Design and Implementation (June 2003), pp. 338--349.
[40]
von Praun, C., Schneider, F., and Gross, T. R. Load Elimination in the Presence of Side Effects, Concurrency and Precise Exceptions. In Proc. International Workshop on Languages and Compilers for Parallel Computing (Oct. 2003), vol. 2958 of Lecture Notes in Computer Science, Springer, pp. 390--405.
[41]
Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. International Symposium on Computer Architecture (June 1995), pp. 24--36.
[42]
Zhang, Y., Sreedhar, V. C., Zhu, W., Sarkar, V., and Gao, G. R. Optimized Lock Assignment and Allocation: A Method for Exploiting Concurrency among Critical Sections. CAPSL Technical Memo Revised 65, University of Delaware, Mar. 2007.

Cited By

View all
  • (2019)TapirACM Transactions on Parallel Computing10.1145/33656556:4(1-33)Online publication date: 17-Dec-2019
  • (2018)Automatic Detection of Large Extended Data-Race-Free Regions with Conflict IsolationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.277150929:3(527-541)Online publication date: 1-Mar-2018
  • (2017)Automatic detection of extended data-race-free regionsProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049835(14-26)Online publication date: 4-Feb-2017
  • Show More Cited By

Index Terms

  1. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    POPL '11: Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
    January 2011
    652 pages
    ISBN:9781450304900
    DOI:10.1145/1926385
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 1
      POPL '11
      January 2011
      624 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1925844
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 January 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data-flow analysis
    2. parallel-program optimization

    Qualifiers

    • Research-article

    Conference

    POPL '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 860 of 4,328 submissions, 20%

    Upcoming Conference

    POPL '26

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)TapirACM Transactions on Parallel Computing10.1145/33656556:4(1-33)Online publication date: 17-Dec-2019
    • (2018)Automatic Detection of Large Extended Data-Race-Free Regions with Conflict IsolationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2017.277150929:3(527-541)Online publication date: 1-Mar-2018
    • (2017)Automatic detection of extended data-race-free regionsProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049835(14-26)Online publication date: 4-Feb-2017
    • (2017)TapirACM SIGPLAN Notices10.1145/3155284.301875852:8(249-265)Online publication date: 26-Jan-2017
    • (2017)TapirProceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3018743.3018758(249-265)Online publication date: 26-Jan-2017
    • (2017)Automatic detection of extended data-race-free regions2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2017.7863725(14-26)Online publication date: Feb-2017
    • (2016)Accelerating Dynamic Data Race Detection Using Static Thread Interference AnalysisProceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/2883404.2883405(30-39)Online publication date: 12-Mar-2016
    • (2015)Region-Based May-Happen-in-Parallel Analysis for C ProgramsProceedings of the 2015 44th International Conference on Parallel Processing (ICPP)10.1109/ICPP.2015.98(889-898)Online publication date: 1-Sep-2015
    • (2013)Interprocedural strength reduction of critical sections in explicitly-parallel programsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523729(29-40)Online publication date: 7-Oct-2013
    • (2013)Online feedback-directed optimizations for parallel Java codeACM SIGPLAN Notices10.1145/2544173.250951848:10(713-728)Online publication date: 29-Oct-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media