ABSTRACT
This work proposes a novel algorithm and Integer Linear Programming (ILP) formulation to optimize the pipelined code mapping of dataflow graph under a given budget generated by optimizing compilers. The goal of this optimization technique is to maximize the throughput of dataflow software pipelining under the given budget, i.e. when the minimum number of fifo buffers needed to optimally balance the dataflow graph are not available with the system. A proposed algorithm uses a two-fold solution by combining a well-established optimal dataflow graph balancing ILP formulation which doesn't consider resource budget constraints with our proposed ILP formulation which considers resource budget constraints. Our algorithm efficiently maximizes the throughput of dataflow software pipeline under a given resource budget. Additionally, we introduce a cycle-accurate dataflow graph simulator for the evaluation of various balancing techniques. We perform an experimental evaluation of different optimizing techniques and show that our proposed novel algorithm performs relatively well compared to existing techniques.
- Dennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, E.R. Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Purushotham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Richard Czekalski, Ashay Rane, Sahil Parmar, Jeff Werner, Jim Sproch, Adrian Macias, and Brian Kurtz. 2020. Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 145--158. Google ScholarDigital Library
- Dennis Abts, Jonathan Ross, Jonathan Sparling, Mark Wong-VanHaren, Max Baker, Tom Hawkins, Andrew Bell, John Thompson, Temesghen Kahsai, Garrin Kimmell, Jennifer Hwang, Rebekah Leslie-Hurd, Michael Bye, E. R. Creswick, Matthew Boyd, Mahitha Venigalla, Evan Laforge, Jon Purdy, Purushotham Kamath, Dinesh Maheshwari, Michael Beidler, Geert Rosseel, Omar Ahmad, Gleb Gagarin, Richard Czekalski, Ashay Rane, Sahil Parmar, Jeff Werner, Jim Sproch, Adrian Macias, and Brian Kurtz. 2020. Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads. IEEE Press, 145--158. Google ScholarDigital Library
- Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., USA.Google ScholarDigital Library
- Arvind and R. S. Nikhil. 1990. Executing a program on the MIT tagged-token dataflow architecture. IEEE Trans. Comput. 39, 3 (March 1990), 300--318. Google ScholarDigital Library
- Vladimir Batagelj and Ulrik Brandes. 2005. Efficient generation of large random networks. Phys. Rev. E 71 (Mar 2005), 036113. Issue 3. Google ScholarCross Ref
- E. Boros, P.L. Hammer, and R. Shamir. 1992. A polynomial algorithm for balancing acyclic data flow graphs. IEEE Trans. Comput. 41, 11 (1992), 1380--1385. Google ScholarDigital Library
- Endre Boros, Peter L. Hammer, Mark E. Hartmann, and Ron Shamir. 1994. Balancing problems in acyclic networks. Discrete Applied Mathematics 49, 1 (1994), 77--93. Special Volume Viewpoints on Optimization. Google ScholarDigital Library
- R.H. Dennard, F.H. Gaensslen, Hwa-Nien Yu, V.L. Rideout, E. Bassous, and A.R. LeBlanc. 1974. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE Journal of Solid-State Circuits 9, 5 (1974), 256--268. Google ScholarCross Ref
- Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-Embedded Modeling Language for Convex Optimization. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 2909--2913.Google Scholar
- Jose M Monsalve Diaz, Kevin Harms, Rafael A. Herrera Guaitero, Diego A. Roa Perdomo, Kalyan Kumaran, and Guang R. Gao. 2022. The SuperCodelet Architecture. In Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions (Seoul, Republic of Korea) (ExHET '22). Association for Computing Machinery, New York, NY, USA, Article 2, 6 pages. Google ScholarDigital Library
- John Ellson, Emden R. Gansner, Eleftherios Koutsofios, Stephen C. North, and Gordon Woodhull. 2004. Graphviz and Dynagraph --- Static and Dynamic Graph Drawing Tools. Springer Berlin Heidelberg, Berlin, Heidelberg, 127--148. Google ScholarCross Ref
- D. R. Ford and D. R. Fulkerson. 2010. Flows in Networks. Princeton University Press, USA.Google Scholar
- G. Gao, J. Suetterlein, and S. Zuckerman. 2011. CAPSL Technical Memo 104: Toward an Execution Model for Extreme-Scale Systems - Runnemede and Beyond.Google Scholar
- Guang R. Gao. 1989. Algorithmic Aspects of Balancing Techniques for Pipelined Data Flow Code Generation. J. Parallel Distrib. Comput. 6, 1 (Feb. 1989), 39--61. Google ScholarDigital Library
- Guang R. Gao. 1990. A Code Mapping Scheme for Dataflow Software Pipelining. Kluwer Academic Publishers, Norwell, MA, USA.Google Scholar
- G. R. Gao, R. Govindarajan, and P. Panangaden. 1992. Well-behaved dataflow programs for DSP computation. In [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 5. 561--564 vol.5. Google ScholarCross Ref
- G. R. Gao and R. Tio. 1989. Instruction set architecture of an efficient pipelined dataflow architecture. In [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track, Vol. 1. 385--392 vol.1. Google ScholarCross Ref
- Al Geist and Robert Lucas. 2009. Major Computer Science Challenges At Exascale. The International Journal of High Performance Computing Applications 23, 4 (2009), 427--436. arXiv:https://doi.org/10.1177/1094342009347445 Google ScholarDigital Library
- GNU. 2021. GLPK (GNU Linear Programming Kit). https://www.gnu.org/software/glpk/.Google Scholar
- R. Govindarajan, Guang R. Gao, and Palash Desai. 2002. Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks. Journal of VLSI signal processing systems for signal, image and video technology 31, 3, 207--229. Google Scholar
- Graphcore. 2021. Graphcore Intelligent Processing Unit. https://www.graphcore.ai/products/ipu.Google Scholar
- Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. In Proceedings of the 7th Python in Science Conference, Gaël Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11 -- 15.Google Scholar
- Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding Sources of Inefficiency in General-Purpose Chips. SIGARCH Comput. Archit. News 38, 3 (jun 2010), 37--47. Google ScholarDigital Library
- Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature 585, 7825 (Sept. 2020), 357--362. Google ScholarCross Ref
- John L. Hennessy and David A. Patterson. 2019. A New Golden Age for Computer Architecture. Commun. ACM 62, 2 (Jan. 2019), 48--60. Google ScholarDigital Library
- Andy Hock. 2020. Cerebras Wafer Scale Engine: An Introduction. https://www.cerebras.net/hello-world/.Google Scholar
- Donald E. Knuth. 1997. The Art of Computer Programming, Volume 1 (3rd Ed.): Fundamental Algorithms. Addison Wesley Longman Publishing Co., Inc., USA.Google ScholarDigital Library
- Habana Labs. 2021. Habana Gaudi 2. https://habana.ai/wp-content/uploads/pdf/2022/gaudi2-whitepaper.pdf.Google Scholar
- Siddhisanket Raskar. 2021. Dataflow Graph Simulator GitHub Repository. https://github.com/sraskar/dataflow-simulator.Google Scholar
- Siddhisanket Raskar and Thomas Applencourt. 2021. Balancing Techniques GitHub Repository. https://github.com/TApplencourt/BalancePoint.Google Scholar
- Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao. 2019. Position Paper: Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-Design. In 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Vol. 2. 640--645. Google ScholarCross Ref
- John Ruttenberg, G. R. Gao, A. Stoutchinin, and W. Lichtenstein. 1996. Software Pipelining Showdown: Optimal vs. Heuristic Methods in a Production Compiler. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (Philadelphia, Pennsylvania, USA) (PLDI '96). Association for Computing Machinery, New York, NY, USA, 1--11. Google ScholarDigital Library
- R.R. Schaller. 1997. Moore's law: past, present and future. IEEE Spectrum 34, 6 (1997), 52--59. Google ScholarDigital Library
- Zuckerman Stéphane, Suetterlein Joshua, Knauerhase Rob, and Gao Guang R. 2011. Using a "Codelet" Program Execution Model for Exascale Machines: Position Paper. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (San Jose, California, USA) (EXADAPT '11). ACM, New York, NY, USA, 64--69. Google ScholarDigital Library
- SambaNova Systems. 2021. Accelerated Computing with a Reconfigurable Dataflow Architecture. https://sambanova.ai/.Google Scholar
Index Terms
- Towards Maximum Throughput of Dataflow Software Pipeline under Resource Constraints
Recommendations
Implementation of Dataflow Software Pipelining for Codelet Model
ICPE '23: Proceedings of the 2023 ACM/SPEC International Conference on Performance EngineeringComputer architectures have evolved from single core to chips with thousands of cores. Loop and instruction level parallelism techniques like software pipelining that are successful for single cores have limitations in the multi-core era. We extend the ...
Codelet Pipe: Realization of Dataflow Software Pipelining for Extended Codelet Model
ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing WorkshopsDataflow Software Pipelining for Codelet Model is a coarse-grained code-mapping scheme designed to exploit pipelined parallelism across Codelets executing on different cores. The extended operational semantics of the Codelet model exploit pipelined ...
A Heuristic Ceiling Point Algorithm for General Integer Linear Programming
This paper first examines the role of ceiling points in solving a pure, general integer linear programming problem P. Several kinds of ceiling points are defined and analyzed and one kind called "feasible 1-ceiling points" proves to be of special ...
Comments