research-article

Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization

Authors:

Jingling XueAuthors Info & Claims

LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems

Pages 41 - 51

https://doi.org/10.1145/2907950.2907957

Published: 13 June 2016 Publication History

Abstract

Compiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), re- quire precise dependence analysis on arrays and structs in order to vectorize isomorphic scalar instructions and/or reduce dynamic dependence checks incurred at runtime. The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-insensitive models, which are too imprecise in handling arrays and structs). This paper pro- poses an inter-procedural Loop-oriented Pointer Analysis, called LPA, for analyzing arrays and structs to support aggressive SLP and LLV optimizations. Unlike field-insensitive solutions that pre- allocate objects for each memory allocation site, our approach uses a fine-grained memory model to generate location sets based on how structs and arrays are accessed. LPA can precisely analyze ar- rays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, LPA is designed to reuse easily existing points-to resolution algorithms. We evaluate LPA using SLP and LLV, the two classic vectorization techniques on a set of 20 CPU2000/2006 benchmarks. For SLP, LPA enables it to vectorize a total of 133 more basic blocks, with an average of 12.09 per benchmark, resulting in the best speedup of 2.95% for 173.applu. For LLV, LPA has reduced a total of 319 static bound checks, with an average of 22.79 per benchmark, resulting in the best speedup of 7.18% for 177.mesa.

References

[1]

L. Andersen. Program analysis and specialization for the C programming language. PhD thesis, 1994.

[2]

O. Bachmann, P. S. Wang, and E. V. Zima. Chains of recurrences - a method to expedite the evaluation of closed-form functions. In ISSAC ’94, pages 242–249, 1994.

Digital Library

[3]

R. Barik, J. Zhao, and V. Sarkar. Efficient selection of vector instructions using dynamic programming. In MICRO ’10, pages 201–212, 2010.

Digital Library

[4]

B. Hardekopf and C. Lin. Flow-Sensitive Pointer Analysis for Millions of Lines of Code. In CGO ’11, pages 289–298, 2011.

Digital Library

[5]

ISO90. ISO/IEC. international standard ISO/IEC 9899, programming languages - C. 1990.

[6]

M. Jung and S. A. Huss. Fast points-to analysis for languages with structured types. In Software and Compilers for Embedded Systems, pages 107–121. Springer, 2004.

[7]

S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI ’00, pages 145–156, 2000.

Digital Library

[8]

O. Lhoták and K.-C. A. Chung. Points-to analysis with efficient strong updates. In POPL ’11, pages 3–16, 2011.

Digital Library

[9]

Y. Li, T. Tan, Y. Sui, and J. Xue. Self-inferencing reflection resolution for java. In ECOOP ’14, pages 27–53. Springer, 2014.

Digital Library

[10]

Y. Li, T. Tan, Y. Zhang, and J. Xue. Program tailoring: Slicing by sequential criteria. In ECOOP ’16, 2016.

[11]

J. Liu, Y. Zhang, O. Jang, W. Ding, and M. Kandemir. A compiler framework for extracting superword level parallelism. In PLDI ’12, pages 347–358, 2012.

Digital Library

[12]

S. Maleki, Y. Gao, M. J. Garzarán, T. Wong, and D. A. Padua. An evaluation of vectorizing compilers. In PACT ’11, pages 372–382, 2011.

Digital Library

[13]

P. H. Nguyen and J. Xue. Interprocedural side-effect analysis and optimisation in the presence of dynamic class loading. In ACSC ’05, pages 9–18, 2015.

Digital Library

[14]

E. Nuutila and E. Soisalon-Soininen. On finding the strongly connected components in a directed graph. Information Processing Letters, 49(1):9–14, 1994.

Digital Library

[15]

D. Nuzman and A. Zaks. Outer-loop vectorization: Revisited for short SIMD architectures. In PACT ’08, pages 2–11. ACM, 2008.

Digital Library

[16]

D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI ’06, pages 132–143, 2006.

Digital Library

[17]

D. J. Pearce, P. H. Kelly, and C. Hankin. Efficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems, 30(1):4, 2007.

Digital Library

[18]

F. M. Q. Pereira and D. Berlin. Wave propagation and deep propagation for pointer analysis. In CGO ’09, pages 126–135, 2009.

Digital Library

[19]

V. Porpodas, A. Magni, and T. M. Jones. PSLP: Padded SLP automatic vectorization. In CGO ’15, pages 190–201, 2015.

Digital Library

[20]

R. R. Rick Hank, Loreena Lee. Implementing next generation pointsto in open64. In Open64 Developers Forum, 2010. URL http: //www.affinic.com/documents/open64workshop/2010/.

[21]

J. Shin. Introducing control flow into vectorized code. In PACT ’07, pages 280–291, 2007.

Digital Library

[22]

J. Shin, M. Hall, and J. Chame. Superword-level parallelism in the presence of control flow. In CGO ’05, pages 165–175, 2005.

Digital Library

[23]

B. Steensgaard. Points-to analysis in almost linear time. In POPL ’96, pages 32–41. ACM, 1996.

Digital Library

[24]

Y. Sui and J. Xue. SVF: Interprocedural static value-flow analysis in LLVM. In CC ’16, 2016. https://github.com/unsw-corg/SVF.

Digital Library

[25]

Y. Sui, D. Ye, and J. Xue. Static memory leak detection using fullsparse value-flow analysis. In ISSTA ’12, pages 254–264, 2012.

Digital Library

[26]

Y. Sui, Y. Li, and X. Jingling. Query-directed adaptive heap cloning for optimizing compilers. In CGO ’13, CGO ’13, pages 1–11, 2013.

Digital Library

[27]

Y. Sui, S. Ye, J. Xue, and J. Zhang. Making context-sensitive inclusion-based pointer analysis practical for compilers using parameterised summarisation. Software: Practice and Experience, 44(12): 1485–1510, 2014.

Digital Library

[28]

Y. Sui, P. Di, and J. Xue. Sparse flow-sensitive pointer analysis for multithreaded programs. In CGO ’16, pages 160–170, 2016.

Digital Library

[29]

K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In PACT ’09, pages 327–337, 2009.

Digital Library

[30]

R. van Engelen. Efficient symbolic analysis for optimizing compilers. In CC ’01, pages 118–132, 2001.

Digital Library

[31]

R. P. Wilson and M. S. Lam. Efficient context-sensitive pointer analysis for C programs. In PLDI ’95, pages 1–12, 1995.

Digital Library

[32]

S. Ye, Y. Sui, and J. Xue. Region-based selective flow-sensitive pointer analysis. In SAS ’14, pages 319–336. Springer, 2014.

[33]

H. Zhou and J. Xue. A compiler approach for exploiting partial SIMD parallelism. ACM Transactions on Architecture and Code Optimization, 13(1):11:1–11:26, 2016.

Digital Library

[34]

H. Zhou and J. Xue. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In CGO ’16, pages 59–69, 2016.

Digital Library

Cited By

Yao JZhou HZhang YLi YFeng CChen SChen JWang YHu Q(2023)High Performance and Power Efficient Accelerator for Cloud Inference2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070941(1003-1016)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070941
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor F(2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3391898
Jordan MKnorst TVicenzi JRutzig M(2019)Boosting SIMD Benefits through a Run-time and Energy Efficient DLP Detection2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714826(722-727)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8714826
Show More Cited By

Index Terms

Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization
1. Theory of computation
  1. Semantics and reasoning
    1. Program semantics

Recommendations

Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization
Special Issue on MEMCODE 2015 and Regular Papers (Diamonds)

Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-...
Efficient field-sensitive pointer analysis of C

The subject of this article is flow- and context-insensitive pointer analysis. We present a novel approach for precisely modelling struct variables and indirect function calls. Our method emphasises efficiency and simplicity and is based on a simple ...
Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization
LCTES '16

Compiler-based auto-vectorization is a promising solution to automatically generate code that makes efficient use of SIMD processors in high performance platforms and embedded systems. Two main auto-vectorization techniques, superword-level parallelism ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LCTES 2016: Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, Tools, and Theory for Embedded Systems

June 2016

122 pages

ISBN:9781450343169

DOI:10.1145/2907950

General Chair:
Tei-Wei Kuo,
Program Chair:
David B. Whalley

ACM SIGPLAN Notices Volume 51, Issue 5
LCTES '16
May 2016
122 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2980930
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

LCTES'16

Sponsor:

LCTES'16: SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2016

June 13 - 14, 2016

CA, Santa Barbara, USA

Acceptance Rates

Overall Acceptance Rate 116 of 438 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
292
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao JZhou HZhang YLi YFeng CChen SChen JWang YHu Q(2023)High Performance and Power Efficient Accelerator for Cloud Inference2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10070941(1003-1016)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10070941
Latifis IParashar KDimitroulakos GCappelle HLezos CMasselos KCatthoor F(2020)A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data ParallelismACM Transactions on Embedded Computing Systems10.1145/339189819:6(1-27)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3391898
Jordan MKnorst TVicenzi JRutzig M(2019)Boosting SIMD Benefits through a Run-time and Energy Efficient DLP Detection2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE.2019.8714826(722-727)Online publication date: Mar-2019
https://doi.org/10.23919/DATE.2019.8714826
Fu SHong DLiu YWu JHsu W(2019)Optimizing data permutations in structured loads/stores translation and SIMD register mapping for a cross-ISA dynamic binary translatorJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2019.07.00898:C(173-190)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.sysarc.2019.07.008
Jordan MKnorst TVicenzi JRutzig M(2018)Runtime Vectorization of Conditional Code and Dynamic Range Loops to ARM NEON Engine2018 VIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC.2018.00019(67-74)Online publication date: Nov-2018
https://doi.org/10.1109/SBESC.2018.00019
Maalej MPaisante VMagno Quinto Pereira FGonnord L(2018)Combining range and inequality information for pointer disambiguationScience of Computer Programming10.1016/j.scico.2017.10.014152:C(161-184)Online publication date: 15-Jan-2018
https://dl.acm.org/doi/10.1016/j.scico.2017.10.014
Fu SHong DLiu YWu JHsu W(2018)Efficient and retargetable SIMD translation in a dynamic binary translatorSoftware: Practice and Experience10.1002/spe.257348:6(1312-1330)Online publication date: 27-Feb-2018
https://doi.org/10.1002/spe.2573
Maalej MPaisante VRamos PGonnord LPereira FReddi VSmith ATang L(2017)Pointer disambiguation via strict inequalitiesProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049848(134-147)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3049832.3049848
Reiche OKobylko CHannig FTeich J(2017)Auto-vectorization for image processing DSLsACM SIGPLAN Notices10.1145/3140582.308103952:5(21-30)Online publication date: 21-Jun-2017
https://dl.acm.org/doi/10.1145/3140582.3081039
Fu SHong DLiu YWu JHsu W(2017)Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensionsACM SIGPLAN Notices10.1145/3140582.308102952:5(31-40)Online publication date: 21-Jun-2017
https://dl.acm.org/doi/10.1145/3140582.3081029
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten