skip to main content
10.1145/2442516.2442529acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

From relational verification to SIMD loop synthesis

Published: 23 February 2013 Publication History

Abstract

Existing pattern-based compiler technology is unable to effectively exploit the full potential of SIMD architectures. We present a new program synthesis based technique for auto-vectorizing performance critical innermost loops. Our synthesis technique is applicable to a wide range of loops, consistently produces performant SIMD code, and generates correctness proofs for the output code. The synthesis technique, which leverages existing work on relational verification methods, is a novel combination of deductive loop restructuring, synthesis condition generation and a new inductive synthesis algorithm for producing loop-free code fragments. The inductive synthesis algorithm wraps an optimized depth-first exploration of code sequences inside a CEGIS loop. Our technique is able to quickly produce SIMD implementations (up to 9 instructions in 0.12 seconds) for a wide range of fundamental looping structures. The resulting SIMD implementations outperform the original loops by 2.0x-3.7x.

References

[1]
M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeno JVM. In OOPSLA, 2000.
[2]
S. Bansal and A. Aiken. Automatic generation of peephole superoptimizers. In ASPLOS, 2006.
[3]
E. Barr, C. Bird, and M. Marron. Collecting a Heap of Shapes. Technical Report MSR-TR-2011-135, Microsoft Research, Dec. 2011.
[4]
G. Barthe, J. M. Crespo, and C. Kunz. Relational verification using product programs. In FM, 2011.
[5]
G. Barthe, J. M. Crespo, and C. Kunz. Beyond 2-safety: Asymmetric product programs for relational program verification. In LFCS, 2013.
[6]
G. Barthe, P. R. DArgenio, and T. Rezk. Secure information flow by self-composition. In CSFW, 2004.
[7]
M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. Venter. SPUR: A trace-based JIT compiler for CIL. In OOPSLA, 2010.
[8]
N. Benton. Simple relational correctness proofs for static analyses and program transformations. In POPL, 2004.
[9]
P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In PLDI, 2005.
[10]
S. Gulwani. Dimensions in program synthesis. In PPDP, 2010. Invited talk paper.
[11]
S. Gulwani. Synthesis from examples: Interaction models and algorithms. SYNASC, 2012. Invited talk paper.
[12]
S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loopfree programs. In PLDI, 2011.
[13]
S. Gulwani, V. A. Korthikanti, and A. Tiwari. Synthesizing geometry constructions. In PLDI, 2011.
[14]
Intel Optimization Manual (June 2011) -- Section 6.5.1. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.
[15]
S. Jha, S. Gulwani, S. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In ICSE, 2010.
[16]
R. Joshi, G. Nelson, and K. H. Randall. Denali: A goal-directed superoptimizer. In PLDI, 2002.
[17]
C. Jung, S. Rus, B. P. Railing, N. Clark, and S. Pande. Brainy: Effective selection of data structures. In PLDI, 2011.
[18]
K. Kennedy and J. Allen. Optimizing compilers for modern architectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., 2002.
[19]
S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, 2000.
[20]
K.-K. Ma and J. Foster. Inferring aliasing and encapsulation properties for java. In OOPSLA, 2007.
[21]
S. Maleki, Y. Gao, M. Garzaran, T.Wong, and D. Padua. An evaluation of vectorizing compilers. In PACT, 2011.
[22]
M. Marron. Structural analysis: Shape information via points-to computation. Technical Report 1201.1277, arXiv, Jan. 2012.
[23]
H. Massalin. Superoptimizer - a look at the smallest program. In ASPLOS, 1987.
[24]
A. Menon, O. Tamuz, S. Gulwani, B. Lampson, and A. Kalai. A machine learning framework for programming by example. In ICML, 2013.
[25]
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. Producing wrong data without doing anything obviously wrong! In ASPLOS, 2009.
[26]
G. Necula. Proof-carrying code. In POPL, 1997.
[27]
G. Necula and P. Lee. Safe kernel extensions without run-time checking. In OSDI, 1996.
[28]
D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for SIMD. In PLDI, 2006.
[29]
D. Nuzman and A. Zaks. Outer-loop vectorization: Revisited for short SIMD architectures. In PACT, 2008.
[30]
A. Pnueli, M. Siegel, and F. Singerman. Translation validation. In TACAS, 1998.
[31]
B. Ren, G. Agrawal, J. Larus, T. Mytkowicz, T. Poutanen, and W. Schulte. SIMD parallelization of applications that traverse irregular data structures. In CGO, 2013.
[32]
K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In ESEC/FSE-13, 2005.
[33]
J. Shin, M. Hall, and J. Cha. Superword-level parallelism in the presence of control flow. In CGO, 2005.
[34]
R. Singh, S. Gulwani, and S. Rajamani. Automatically generating algebra problems. In AAAI, 2012.
[35]
A. Solar Lezama. Program Synthesis By Sketching. PhD thesis, EECS Department, University of California, Berkeley, Dec 2008.
[36]
A. Solar-Lezama, R. M. Rabbah, R. Bodík, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In PLDI, 2005.
[37]
SPEC. Standard Performance Evaluation Corporation (SPEC). http://www.spec.org/cpu2006/.
[38]
S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In POPL, 2010.
[39]
R.Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and P. Stenstrom. The worst-case execution-time problem: Overview of methods and survey of tools. ACM TECS, 7(3), 2008.
[40]
P.Wu, A. Eichenberger, and A.Wang. Efficient SIMD code generation for runtime alignment and length conversion. In CGO, 2005.
[41]
K. Yotov, X. Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2), 2005.
[42]
A. Zaks and A. Pnueli. Covac: Compiler validation by program analysis of the cross-product. 2008.
[43]
L. D. Zuck, A. Pnueli, and B. Goldberg. Voc: A methodology for the translation validation of optimizing compilers. J. UCS, 9(3), 2003.

Cited By

View all
  • (2024)Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics ApplicationsACM Transactions on Evolutionary Learning and Optimization10.1145/37039204:4(1-29)Online publication date: 15-Nov-2024
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • (2024)Boost Linear Algebra Computation Performance via Efficient VNNI UtilizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651333(149-163)Online publication date: 27-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2013
332 pages
ISBN:9781450319225
DOI:10.1145/2442516
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 48, Issue 8
    PPoPP '13
    August 2013
    309 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2517327
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 February 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. program vectorization
  2. relational verification
  3. synthesis

Qualifiers

  • Research-article

Conference

PPoPP '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)3
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics ApplicationsACM Transactions on Evolutionary Learning and Optimization10.1145/37039204:4(1-29)Online publication date: 15-Nov-2024
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • (2024)Boost Linear Algebra Computation Performance via Efficient VNNI UtilizationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651333(149-163)Online publication date: 27-Apr-2024
  • (2023)Fast Instruction Selection for Fast Digital Signal ProcessingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624768(125-137)Online publication date: 25-Mar-2023
  • (2023)Toward Programming Languages for Reasoning: Humans, Symbolic Systems, and AI AgentsProceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3622758.3622895(136-152)Online publication date: 18-Oct-2023
  • (2023)Faster sorting algorithms discovered using deep reinforcement learningNature10.1038/s41586-023-06004-9618:7964(257-263)Online publication date: 7-Jun-2023
  • (2022)Understanding the Power of Evolutionary Computation for GPU Code Optimization2022 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC55918.2022.00025(185-198)Online publication date: Nov-2022
  • (2022)Verification of Vectorization of Signal TransformsLanguages and Compilers for Parallel Computing10.1007/978-3-030-95953-1_15(215-231)Online publication date: 16-Feb-2022
  • (2021)Porcupine: a synthesizing compiler for vectorized homomorphic encryptionProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454050(375-389)Online publication date: 19-Jun-2021
  • (2021)Vectorization for digital signal processors via equality saturationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446707(874-886)Online publication date: 19-Apr-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media