research-article

Automated derivation of parametric data movement lower bounds for affine programs

Authors:

Auguste Olivry,

Louis-Noël Pouchet,

Fabrice RastelloAuthors Info & Claims

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 808 - 822

https://doi.org/10.1145/3385412.3385989

Published: 11 June 2020 Publication History

Abstract

Researchers and practitioners have for long worked on improving the computational complexity of algorithms, focusing on reducing the number of operations needed to perform a computation. However the hardware trend nowadays clearly shows a higher performance and energy cost for data movements than computations: quality algorithms have to minimize data movements as much as possible.

The theoretical operational complexity of an algorithm is a function of the total number of operations that must be executed, regardless of the order in which they will actually be executed. But theoretical data movement (or, I/O) complexity is fundamentally different: one must consider all possible legal schedules of the operations to determine the minimal number of data movements achievable, a major theoretical challenge. I/O complexity has been studied via complex manual proofs, e.g., refined from Ω(n³/√S) for matrix-multiply on a cache size S by Hong & Kung to 2n³/√S by Smith et al. While asymptotic complexity may be sufficient to compare I/O potential between broadly different algorithms, the accuracy of the reasoning depends on the tightness of these I/O lower bounds. Precisely, exposing constants is essential to enable precise comparison between different algorithms: for example the 2n³/√S lower bound allows to demonstrate the optimality of panel-panel tiling for matrix-multiplication.

We present the first static analysis to automatically derive non-asymptotic parametric expressions of data movement lower bounds with scaling constants, for arbitrary affine computations. Our approach is fully automatic, assisting algorithm designers to reason about I/O complexity and make educated decisions about algorithmic alternatives.

References

[1]

Laksono Adhianto, S. Banerjee, Michael W. Fagan, Mark Krentel, Gabriel Marin, John M. Mellor-Crummey, and Nathan R. Tallent. 2010.

[2]

HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685–701.

[3]

Alok Aggarwal and Jeffrey S. Vitter. 1988. The Input/Output Complexity of Sorting and Related Problems. Commun. ACM 31 (1988), 1116–1127.

Digital Library

[4]

Issue 9.

[5]

Grey Ballard, Erin Carson, James Demmel, Mark Hoemmen, Nick Knight, and Oded Schwartz. 2014. Communication lower bounds and optimal algorithms for numerical linear algebra. Acta Numerica 23 (2014), 1–155.

[6]

Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2011.

[7]

Minimizing Communication in Numerical Linear Algebra. SIAM J. Matrix Analysis Applications 32, 3 (2011), 866–901.

[8]

Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2012.

[9]

Graph expansion and communication costs of fast matrix multiplication. J. ACM 59, 6 (2012), 32.

[10]

Alexander I. Barvinok. 1994. A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Mathematics of Operations Research 19, 4 (1994), 769–779.

Digital Library

[11]

Christian Bauer, Alexander Frink, and Richard Kreckel. 2002. Introduction to the GiNaC Framework for Symbolic Computation within the C++ Programming Language. J. Symbolic Computation 33 (2002), 1–12.

Digital Library

[12]

Gianfranco Bilardi and Enoch Peserico. 2001.

[13]

A characterization of temporal locality and its portability across memory hierarchies. Automata, Languages and Programming (2001), 128–139.

[14]

Gianfranco Bilardi, Michele Scquizzato, and Francesco Silvestri. 2012. A Lower Bound Technique for Communication on BSP with Application to the FFT. In Euro-Par 2012 Parallel Processing - 18th International Conference, Euro-Par 2012, Rhodes Island, Greece, August 27-31, 2012. Proceedings. 676–687.

Digital Library

[15]

Michael Christ, James Demmel, Nicholas Knight, Thomas Scanlon, and Katherine Yelick. 2013.

[16]

Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays — Part 1. EECS Technical Report EECS–2013-61. UC Berkeley.

[17]

James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou. 2012.

[18]

Communication-optimal Parallel and Sequential QR and LU Factorizations. SIAM J. Scientific Computing 34, 1 (2012), A206–A239.

[19]

Venmugil Elango, Fabrice Rastello, Louis-Noël Pouchet, J. Ramanujam, and P. Sadayappan. 2014.

[20]

On characterizing the data movement complexity of computational DAGs for parallel execution. In Proc. of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’14, Prague, Czech Republic - June 23 - 25, 2014. 296–306.

[21]

Venmugil Elango, Fabrice Rastello, Louis-Noël Pouchet, J. Ramanujam, and P. Sadayappan. 2015.

[22]

On Characterizing the Data Access Complexity of Programs. In Proc. of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015. 567–580.

[23]

Paul Feautrier. 1988.

[24]

Parametric integer programming. RAIRO Recherche Opérationnelle 22, 3 (1988), 243–268.

[25]

Paul Feautrier. 1992. Some efficient solutions to the affine scheduling problem. I. One-dimensional time. International Journal of Parallel Programming 21, 5 (1992), 313–347.

Digital Library

[26]

Paul Feautrier and Christian Lengauer. 2011. Polyhedron model. In Encyclopedia of Parallel Computing. 1581–1592.

[27]

Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. 1999. Cache-Oblivious Algorithms. In Proc. of the 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA. 285–298.

[28]

Jia-Wei Hong and H. T. Kung. 1981. I/O complexity: The red-blue pebble game. In Proc. of the 13th Annual ACM Symposium on Theory of Computing (STOC ’81), May 11-13, 1981, Milwaukee, Wisconsin, USA. 326–333.

[29]

Dror Irony, Sivan Toledo, and Alexandre Tiskin. 2004. Communication lower bounds for distributed-memory matrix multiplication. J. Parallel and Distrib. Comput. 64, 9 (2004), 1017–1026.

Digital Library

[30]

Grzegorz Kwasniewski, Marko Kabic, Maciej Besta, Joost VandeVondele, Raffaele Solcà, and Torsten Hoefler. 2019. Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication. In Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, USA, November 17-19, 2019. 24:1–24:22.

Digital Library

[31]

Lynn H. Loomis and Hassler Whitney. 1949. An inequality related to the isoperimetric inequality. Bull. Am. Math. Soc. 55 (1949), 961–962.

[32]

Auguste Olivry, Julien Langou, Louis-Noël Pouchet, P. Sadayappan, and Fabrice Rastello. 2019. Automated Derivation of Parametric Data Movement Lower Bounds for Affine Programs. arXiv: cs.CC/1911.06664

[33]

Louis-Noël Pouchet and Tomofumi Yuki. 2015.

[34]

PolyBench/C 4.2. http://polybench.sf.net/.

[35]

J. Ramanujam and P. Sadayappan. 1992.

[36]

Tiling multidimensional iteration spaces for multicomputers. J. Parallel and Distrib. Comput. 16, 2 (1992), 108–230.

[37]

Desh Ranjan, John E. Savage, and Mohammad Zubair. 2010.

[38]

Upper and Lower I/O Bounds for Pebbling r-Pyramids. In Combinatorial Algorithms - 21st International Workshop, IWOCA 2010, London, UK, July 26-28, 2010, Revised Selected Papers. 107–120.

[39]

Desh Ranjan, John E. Savage, and Mohammad Zubair. 2011.

[40]

Strong I/O Lower Bounds for Binomial and FFT Computation Graphs. In Computing and Combinatorics. LNCS, Vol. 6842. 134–145.

[41]

Desh Ranjan, John E. Savage, and Mohammad Zubair. 2012. Upper and lower I/O bounds for pebbling r-pyramids. J. Discrete Algorithms 14 (2012), 2–12.

Digital Library

[42]

John E. Savage. 1995.

[43]

Extending the Hong-Kung model to memory hierarchies. In Computing and Combinatorics. LNCS, Vol. 959. 270–281.

[44]

John E. Savage and Mohammad Zubair. 2008.

[45]

A unified model for multicore architectures. In Proc. of the 1st international forum on Next-generation multicore/manycore technologies, IFMT 2008, Cairo, Egypt, November 24-25, 2008. 9.

[46]

Tyler Michael Smith, Bradley Lowery, Julien Langou, and Robert A. van de Geijn. 2019. A Tight I/O Lower Bound for Matrix Multiplication. arXiv: 1702.02017v2

[47]

Volker Strassen. 1969. Gaussian elimination is not optimal. Numerische mathematik 13, 4 (1969), 354–356.

[48]

Sven Verdoolaege. 2010. ISL: An integer set library for the polyhedral model. In Mathematical Software–ICMS 2010. 299–302.

Digital Library

[49]

Sven Verdoolaege. 2018.

[50]

Integer Set Library: Manual. http://isl.gforge.inria.fr/manual.pdf.

[51]

Sven Verdoolaege and Tobias Grosser. 2012.

[52]

Polyhedral Extraction Tool. In Second International Workshop on Polyhedral Compilation Techniques (IMPACT’12).

[53]

Samuel Williams, Andrew Waterman, and David Patterson. 2009.

Cited By

Xia RCao LZhang HGuo JGuo XLiu JWang H(2025)An Approach to Tight I/O Lower Bounds for Algorithms with Composite ProceduresComputing and Combinatorics10.1007/978-981-96-1093-8_13(152-163)Online publication date: 20-Feb-2025
https://doi.org/10.1007/978-981-96-1093-8_13
Liu FZhu YSun SDing CSmith WHosseini K(2024)Parallel Loop Locality Analysis for Symbolic Thread CountsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676948(219-232)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676948
Canesche MRosário VBorin EQuintão Pereira F(2024)The Droplet Search Algorithm for Kernel SchedulingACM Transactions on Architecture and Code Optimization10.1145/365010921:2(1-28)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3650109
Show More Cited By

Index Terms

Automated derivation of parametric data movement lower bounds for affine programs
1. Software and its engineering
  1. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Automated static analysis
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

IOOpt: automatic derivation of I/O complexity bounds for affine programs
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Evaluating the complexity of an algorithm is an important step when developing applications, as it impacts both its time and energy performance. Computational complexity, which is the number of dynamic operations regardless of the execution order, is ...
On Characterizing the Data Access Complexity of Programs
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental ...
Spectral Lower Bounds on the I/O Complexity of Computation Graphs
SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures

We consider the problem of finding lower bounds on the I/O complexity of arbitrary computations in a two level memory hierarchy. Executions of complex computations can be formalized as an evaluation order over the underlying computation graph. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2020

1174 pages

ISBN:9781450376136

DOI:10.1145/3385412

General Chair:
Alastair F. Donaldson
Imperial College London, UK
,
Program Chair:
Emina Torlak
University of Washington, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

PLDI '20

Sponsor:

SIGPLAN

PLDI '20: 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 15 - 20, 2020

London, UK

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
597
Total Downloads

Downloads (Last 12 months)181
Downloads (Last 6 weeks)30

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xia RCao LZhang HGuo JGuo XLiu JWang H(2025)An Approach to Tight I/O Lower Bounds for Algorithms with Composite ProceduresComputing and Combinatorics10.1007/978-981-96-1093-8_13(152-163)Online publication date: 20-Feb-2025
https://doi.org/10.1007/978-981-96-1093-8_13
Liu FZhu YSun SDing CSmith WHosseini K(2024)Parallel Loop Locality Analysis for Symbolic Thread CountsProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676948(219-232)Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3656019.3676948
Canesche MRosário VBorin EQuintão Pereira F(2024)The Droplet Search Algorithm for Kernel SchedulingACM Transactions on Architecture and Code Optimization10.1145/365010921:2(1-28)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3650109
Pouchet LTucker EZhang NChen HPal DRodríguez GZhang ZZhang ZPutnam A(2024)Formal Verification of Source-to-Source Transformations for HLSProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637563(97-107)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637563
Eyraud-Dubois LIooss GLangou JRastello FAgrawal KPetrank E(2024)Tightening I/O Lower Bounds through the Hourglass Dependency PatternProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659986(183-193)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3626183.3659986
Huang QTsai PEmer JParashar A(2024)Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00021(150-166)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00021
Reber BGould MKneipp ALiu FPrechtl IDing CChen LPatru D(2023)Cache Programming for Scientific Loops Using LeasesACM Transactions on Architecture and Code Optimization10.1145/360009020:3(1-25)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3600090
Al Daas HBallard GGrigori LKumar SRouse KAgrawal KShun J(2023)Parallel Memory-Independent Communication Bounds for SYRKProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591072(391-401)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591072
Beaumont OCollin JEyraud-Dubois LVérité M(2023)Data Distribution Schemes for Dense Linear Algebra Factorizations on Any Number of Nodes2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00047(390-401)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00047
Smith WGoldfarb ADing CRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Beyond time complexityProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532395(1-12)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532395
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten