skip to main content
10.1145/2907950.2907957acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization

Published: 13 June 2016 Publication History

Abstract

Compiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), re- quire precise dependence analysis on arrays and structs in order to vectorize isomorphic scalar instructions and/or reduce dynamic dependence checks incurred at runtime. The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-insensitive models, which are too imprecise in handling arrays and structs). This paper pro- poses an inter-procedural Loop-oriented Pointer Analysis, called LPA, for analyzing arrays and structs to support aggressive SLP and LLV optimizations. Unlike field-insensitive solutions that pre- allocate objects for each memory allocation site, our approach uses a fine-grained memory model to generate location sets based on how structs and arrays are accessed. LPA can precisely analyze ar- rays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, LPA is designed to reuse easily existing points-to resolution algorithms. We evaluate LPA using SLP and LLV, the two classic vectorization techniques on a set of 20 CPU2000/2006 benchmarks. For SLP, LPA enables it to vectorize a total of 133 more basic blocks, with an average of 12.09 per benchmark, resulting in the best speedup of 2.95% for 173.applu. For LLV, LPA has reduced a total of 319 static bound checks, with an average of 22.79 per benchmark, resulting in the best speedup of 7.18% for 177.mesa.

References

[1]
L. Andersen. Program analysis and specialization for the C programming language. PhD thesis, 1994.
[2]
O. Bachmann, P. S. Wang, and E. V. Zima. Chains of recurrences - a method to expedite the evaluation of closed-form functions. In ISSAC ’94, pages 242–249, 1994.
[3]
R. Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. In MICRO ’10, pages 201–212, 2010.
[4]
B. Hardekopf and C. Lin. Flow-Sensitive Pointer Analysis for Millions of Lines of Code. In CGO ’11, pages 289–298, 2011.
[5]
ISO90. ISO/IEC. international standard ISO/IEC 9899, programming languages - C. 1990.
[6]
M. Jung and S. A. Huss. Fast points-to analysis for languages with structured types. In Software and Compilers for Embedded Systems, pages 107–121. Springer, 2004.
[7]
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI ’00, pages 145–156, 2000.
[8]
O. Lhoták and K.-C. A. Chung. Points-to analysis with efficient strong updates. In POPL ’11, pages 3–16, 2011.
[9]
Y. Li, T. Tan, Y. Sui, and J. Xue. Self-inferencing reflection resolution for java. In ECOOP ’14, pages 27–53. Springer, 2014.
[10]
Y. Li, T. Tan, Y. Zhang, and J. Xue. Program tailoring: Slicing by sequential criteria. In ECOOP ’16, 2016.
[11]
J. Liu, Y. Zhang, O. Jang, W. Ding, and M. Kandemir. A compiler framework for extracting superword level parallelism. In PLDI ’12, pages 347–358, 2012.
[12]
S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua. An evaluation of vectorizing compilers. In PACT ’11, pages 372–382, 2011.
[13]
P. H. Nguyen and J. Xue. Interprocedural side-effect analysis and optimisation in the presence of dynamic class loading. In ACSC ’05, pages 9–18, 2015.
[14]
E. Nuutila and E. Soisalon-Soininen. On finding the strongly connected components in a directed graph. Information Processing Letters, 49(1):9–14, 1994.
[15]
D. Nuzman and A. Zaks. Outer-loop vectorization: Revisited for short SIMD architectures. In PACT ’08, pages 2–11. ACM, 2008.
[16]
D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI ’06, pages 132–143, 2006.
[17]
D. J. Pearce, P. H. Kelly, and C. Hankin. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems, 30(1):4, 2007.
[18]
F. M. Q. Pereira and D. Berlin. Wave propagation and deep propagation for pointer analysis. In CGO ’09, pages 126–135, 2009.
[19]
V. Porpodas, A. Magni, and T. M. Jones. PSLP: Padded SLP automatic vectorization. In CGO ’15, pages 190–201, 2015.
[20]
R. R. Rick Hank, Loreena Lee. Implementing next generation pointsto in open64. In Open64 Developers Forum, 2010. URL http: //www.affinic.com/documents/open64workshop/2010/.
[21]
J. Shin. Introducing control flow into vectorized code. In PACT ’07, pages 280–291, 2007.
[22]
J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In CGO ’05, pages 165–175, 2005.
[23]
B. Steensgaard. Points-to analysis in almost linear time. In POPL ’96, pages 32–41. ACM, 1996.
[24]
Y. Sui and J. Xue. SVF: Interprocedural static value-flow analysis in LLVM. In CC ’16, 2016. https://github.com/unsw-corg/SVF.
[25]
Y. Sui, D. Ye, and J. Xue. Static memory leak detection using fullsparse value-flow analysis. In ISSTA ’12, pages 254–264, 2012.
[26]
Y. Sui, Y. Li, and X. Jingling. Query-directed adaptive heap cloning for optimizing compilers. In CGO ’13, CGO ’13, pages 1–11, 2013.
[27]
Y. Sui, S. Ye, J. Xue, and J. Zhang. Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation. Software: Practice and Experience, 44(12): 1485–1510, 2014.
[28]
Y. Sui, P. Di, and J. Xue. Sparse flow-sensitive pointer analysis for multithreaded programs. In CGO ’16, pages 160–170, 2016.
[29]
K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In PACT ’09, pages 327–337, 2009.
[30]
R. van Engelen. Efficient symbolic analysis for optimizing compilers. In CC ’01, pages 118–132, 2001.
[31]
R. P. Wilson and M. S. Lam. Efficient context-sensitive pointer analysis for C programs. In PLDI ’95, pages 1–12, 1995.
[32]
S. Ye, Y. Sui, and J. Xue. Region-based selective flow-sensitive pointer analysis. In SAS ’14, pages 319–336. Springer, 2014.
[33]
H. Zhou and J. Xue. A compiler approach for exploiting partial SIMD parallelism. ACM Transactions on Architecture and Code Optimization, 13(1):11:1–11:26, 2016.
[34]
H. Zhou and J. Xue. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In CGO ’16, pages 59–69, 2016.

Cited By

View all
  • (2023)High Performance and Power Efficient Accelerator for Cloud Inference2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070941(1003-1016)Online publication date: Feb-2023
  • (2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
  • (2019)Boosting SIMD Benefits through a Run-time and Energy Efficient DLP Detection2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714826(722-727)Online publication date: Mar-2019
  • Show More Cited By

Index Terms

  1. Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems
    June 2016
    122 pages
    ISBN:9781450343169
    DOI:10.1145/2907950
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 5
      LCTES '16
      May 2016
      122 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2980930
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Array
    2. Field
    3. Loop
    4. Pointer Analysis
    5. SIMD

    Qualifiers

    • Research-article

    Conference

    LCTES'16

    Acceptance Rates

    Overall Acceptance Rate 116 of 438 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)High Performance and Power Efficient Accelerator for Cloud Inference2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070941(1003-1016)Online publication date: Feb-2023
    • (2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
    • (2019)Boosting SIMD Benefits through a Run-time and Energy Efficient DLP Detection2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714826(722-727)Online publication date: Mar-2019
    • (2019)Optimizing data permutations in structured loads/stores translation and SIMD register mapping for a cross-ISA dynamic binary translatorJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.07.00898:C(173-190)Online publication date: 1-Sep-2019
    • (2018)Runtime Vectorization of Conditional Code and Dynamic Range Loops to ARM NEON Engine2018 VIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC.2018.00019(67-74)Online publication date: Nov-2018
    • (2018)Combining range and inequality information for pointer disambiguationScience of Computer Programming10.1016/j.scico.2017.10.014152:C(161-184)Online publication date: 15-Jan-2018
    • (2018)Efficient and retargetable SIMD translation in a dynamic binary translatorSoftware: Practice and Experience10.1002/spe.257348:6(1312-1330)Online publication date: 27-Feb-2018
    • (2017)Pointer disambiguation via strict inequalitiesProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049848(134-147)Online publication date: 4-Feb-2017
    • (2017)Auto-vectorization for image processing DSLsACM SIGPLAN Notices10.1145/3140582.308103952:5(21-30)Online publication date: 21-Jun-2017
    • (2017)Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensionsACM SIGPLAN Notices10.1145/3140582.308102952:5(31-40)Online publication date: 21-Jun-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media