research-article

Public Access

Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints

Authors:
Siddhisanket Raskar

Argonne National Laboratory, Lemont, Illinois, USA

Argonne National Laboratory, Lemont, Illinois, USA

https://orcid.org/0000-0002-4832-0834
View Profile

,
Thomas Applencourt

Argonne National Laboratory, Lemont, Illinois, USA

Argonne National Laboratory, Lemont, Illinois, USA

https://orcid.org/0000-0001-7522-9449
View Profile

,
Kalyan Kumaran

Argonne National Laboratory, Lemont, Illinois, USA

Argonne National Laboratory, Lemont, Illinois, USA

https://orcid.org/0000-0002-6447-3195
View Profile

,
Guang Gao

University of Delaware, Newark, Delaware, USA

University of Delaware, Newark, Delaware, USA

https://orcid.org/0000-0002-5265-7528
View Profile

PMAM'23: Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and ManycoresFebruary 2023Pages 20–28https://doi.org/10.1145/3582514.3582521

Published:25 February 2023Publication History

PMAM'23: Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores

Pages 20–28

ABSTRACT

This work proposes a novel algorithm and Integer Linear Programming (ILP) formulation to optimize the pipelined code mapping of dataflow graph under a given budget generated by optimizing compilers. The goal of this optimization technique is to maximize the throughput of dataflow software pipelining under the given budget, i.e. when the minimum number of fifo buffers needed to optimally balance the dataflow graph are not available with the system. A proposed algorithm uses a two-fold solution by combining a well-established optimal dataflow graph balancing ILP formulation which doesn't consider resource budget constraints with our proposed ILP formulation which considers resource budget constraints. Our algorithm efficiently maximizes the throughput of dataflow software pipeline under a given resource budget. Additionally, we introduce a cycle-accurate dataflow graph simulator for the evaluation of various balancing techniques. We perform an experimental evaluation of different optimizing techniques and show that our proposed novel algorithm performs relatively well compared to existing techniques.

References

Dennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, E.R. Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Purushotham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Richard Czekalski, Ashay Rane, Sahil Parmar, Jeff Werner, Jim Sproch, Adrian Macias, and Brian Kurtz. 2020. Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 145--158. Google ScholarDigital Library
Dennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, E. R. Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Purushotham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Richard Czekalski, Ashay Rane, Sahil Parmar, Jeff Werner, Jim Sproch, Adrian Macias, and Brian Kurtz. 2020. Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. IEEE Press, 145--158. Google ScholarDigital Library
Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., USA.Google ScholarDigital Library
Arvind and R. S. Nikhil. 1990. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39, 3 (March 1990), 300--318. Google ScholarDigital Library
Vladimir Batagelj and Ulrik Brandes. 2005. Efficient generation of large random networks. Phys. Rev. E 71 (Mar 2005), 036113. Issue 3. Google ScholarCross Ref
E. Boros, P.L. Hammer, and R. Shamir. 1992. A polynomial algorithm for balancing acyclic data flow graphs. IEEE Trans. Comput. 41, 11 (1992), 1380--1385. Google ScholarDigital Library
Endre Boros, Peter L. Hammer, Mark E. Hartmann, and Ron Shamir. 1994. Balancing problems in acyclic networks. Discrete Applied Mathematics 49, 1 (1994), 77--93. Special Volume Viewpoints on Optimization. Google ScholarDigital Library
R.H. Dennard, F.H. Gaensslen, Hwa-Nien Yu, V.L. Rideout, E. Bassous, and A.R. LeBlanc. 1974. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE Journal of Solid-State Circuits 9, 5 (1974), 256--268. Google ScholarCross Ref
Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-Embedded Modeling Language for Convex Optimization. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 2909--2913.Google Scholar
Jose M Monsalve Diaz, Kevin Harms, Rafael A. Herrera Guaitero, Diego A. Roa Perdomo, Kalyan Kumaran, and Guang R. Gao. 2022. The SuperCodelet Architecture. In Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions (Seoul, Republic of Korea) (ExHET '22). Association for Computing Machinery, New York, NY, USA, Article 2, 6 pages. Google ScholarDigital Library
John Ellson, Emden R. Gansner, Eleftherios Koutsofios, Stephen C. North, and Gordon Woodhull. 2004. Graphviz and Dynagraph --- Static and Dynamic Graph Drawing Tools. Springer Berlin Heidelberg, Berlin, Heidelberg, 127--148. Google ScholarCross Ref
D. R. Ford and D. R. Fulkerson. 2010. Flows in Networks. Princeton University Press, USA.Google Scholar
G. Gao, J. Suetterlein, and S. Zuckerman. 2011. CAPSL Technical Memo 104: Toward an Execution Model for Extreme-Scale Systems - Runnemede and Beyond.Google Scholar
Guang R. Gao. 1989. Algorithmic Aspects of Balancing Techniques for Pipelined Data Flow Code Generation. J. Parallel Distrib. Comput. 6, 1 (Feb. 1989), 39--61. Google ScholarDigital Library
Guang R. Gao. 1990. A Code Mapping Scheme for Dataflow Software Pipelining. Kluwer Academic Publishers, Norwell, MA, USA.Google Scholar
G. R. Gao, R. Govindarajan, and P. Panangaden. 1992. Well-behaved dataflow programs for DSP computation. In [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 5. 561--564 vol.5. Google ScholarCross Ref
G. R. Gao and R. Tio. 1989. Instruction set architecture of an efficient pipelined dataflow architecture. In [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track, Vol. 1. 385--392 vol.1. Google ScholarCross Ref
Al Geist and Robert Lucas. 2009. Major Computer Science Challenges At Exascale. The International Journal of High Performance Computing Applications 23, 4 (2009), 427--436. arXiv:https://doi.org/10.1177/1094342009347445 Google ScholarDigital Library
GNU. 2021. GLPK (GNU Linear Programming Kit). https://www.gnu.org/software/glpk/.Google Scholar
R. Govindarajan, Guang R. Gao, and Palash Desai. 2002. Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks. Journal of VLSI signal processing systems for signal, image and video technology 31, 3, 207--229. Google Scholar
Graphcore. 2021. Graphcore Intelligent Processing Unit. https://www.graphcore.ai/products/ipu.Google Scholar
Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Gaël Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 -- 15.Google Scholar
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding Sources of Inefficiency in General-Purpose Chips. SIGARCH Comput. Archit. News 38, 3 (jun 2010), 37--47. Google ScholarDigital Library
Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (Sept. 2020), 357--362. Google ScholarCross Ref
John L. Hennessy and David A. Patterson. 2019. A New Golden Age for Computer Architecture. Commun. ACM 62, 2 (Jan. 2019), 48--60. Google ScholarDigital Library
Andy Hock. 2020. Cerebras Wafer Scale Engine: An Introduction. https://www.cerebras.net/hello-world/.Google Scholar
Donald E. Knuth. 1997. The Art of Computer Programming, Volume 1 (3rd Ed.): Fundamental Algorithms. Addison Wesley Longman Publishing Co., Inc., USA.Google ScholarDigital Library
Habana Labs. 2021. Habana Gaudi 2. https://habana.ai/wp-content/uploads/pdf/2022/gaudi2-whitepaper.pdf.Google Scholar
Siddhisanket Raskar. 2021. Dataflow Graph Simulator GitHub Repository. https://github.com/sraskar/dataflow-simulator.Google Scholar
Siddhisanket Raskar and Thomas Applencourt. 2021. Balancing Techniques GitHub Repository. https://github.com/TApplencourt/BalancePoint.Google Scholar
Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2019. Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. 640--645. Google ScholarCross Ref
John Ruttenberg, G. R. Gao, A. Stoutchinin, and W. Lichtenstein. 1996. Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (Philadelphia, Pennsylvania, USA) (PLDI '96). Association for Computing Machinery, New York, NY, USA, 1--11. Google ScholarDigital Library
R.R. Schaller. 1997. Moore's law: past, present and future. IEEE Spectrum 34, 6 (1997), 52--59. Google ScholarDigital Library
Zuckerman Stéphane, Suetterlein Joshua, Knauerhase Rob, and Gao Guang R. 2011. Using a "Codelet" Program Execution Model for Exascale Machines: Position Paper. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (San Jose, California, USA) (EXADAPT '11). ACM, New York, NY, USA, 64--69. Google ScholarDigital Library
SambaNova Systems. 2021. Accelerated Computing with a Reconfigurable Dataflow Architecture. https://sambanova.ai/.Google Scholar

Index Terms

Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures

Recommendations

Implementation of Dataflow Software Pipelining for Codelet Model
ICPE '23: Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering

Computer architectures have evolved from single core to chips with thousands of cores. Loop and instruction level parallelism techniques like software pipelining that are successful for single cores have limitations in the multi-core era. We extend the ...
Read More
Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model
ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops

Dataflow Software Pipelining for Codelet Model is a coarse-grained code-mapping scheme designed to exploit pipelined parallelism across Codelets executing on different cores. The extended operational semantics of the Codelet model exploit pipelined ...
Read More
A Heuristic Ceiling Point Algorithm for General Integer Linear Programming

This paper first examines the role of ceiling points in solving a pure, general integer linear programming problem P. Several kinds of ceiling points are defined and analyzed and one kind called "feasible 1-ceiling points" proves to be of special ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PMAM'23: Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores
February 2023
73 pages
ISBN:9798400701153
DOI:10.1145/3582514
Program Co-chairs:
Quan Chen,
Zhiyi Huang,
Min Si
Copyright © 2023 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 February 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dataflow model
dataflow software pipelining
integer linear programming
code mapping
optimizations
FIFO buffers
spatial architectures
exa-scale
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate53of97submissions,55%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 97
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints

PMAM'23: Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementation of Dataflow Software Pipelining for Codelet Model

Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model

A Heuristic Ceiling Point Algorithm for General Integer Linear Programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints

PMAM'23: Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementation of Dataflow Software Pipelining for Codelet Model

Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model

A Heuristic Ceiling Point Algorithm for General Integer Linear Programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media