skip to main content
10.1145/3620665.3640392acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

SEER: Super-Optimization Explorer for High-Level Synthesis using E-graph Rewriting

Published: 27 April 2024 Publication History

Abstract

High-level synthesis (HLS) is a process that automatically translates a software program in a high-level language into a low-level hardware description. However, the hardware designs produced by HLS tools still suffer from a significant performance gap compared to manual implementations. This is because the input HLS programs must still be written using hardware design principles.
Existing techniques either leave the program source unchanged or perform a fixed sequence of source transformation passes, potentially missing opportunities to find the optimal design. We propose a super-optimization approach for HLS that automatically rewrites an arbitrary software program into efficient HLS code that can be used to generate an optimized hardware design. We developed a toolflow named SEER, based on the e-graph data structure, to efficiently explore equivalent implementations of a program at scale. SEER provides an extensible framework, orchestrating existing software compiler passes and hardware optimizers.
Our work is the first attempt to exploit e-graph rewriting for large software compiler frameworks, such as MLIR. Across a set of open-source benchmarks, we show that SEER achieves up to 38× the performance within 1.4× the area of the original program. Via an Intel-provided case study, SEER demonstrates the potential to outperform manually optimized designs produced by hardware experts.

References

[1]
Nicolas Bohm Agostini, Serena Curzel, David Kaeli, and Antonino Tumeo. Soda-opt an mlir based flow for co-design and high-level synthesis. In Proceedings of the 19th ACM International Conference on Computing Frontiers, CF '22, page 201--202, New York, NY, USA, 2022. Association for Computing Machinery.
[2]
Amir H Ashouri, Andrea Bignoli, Gianluca Palermo, Cristina Silvano, Sameer Kulkarni, and John Cavazos. Micomp: Mitigating the compiler phase-ordering problem using optimization sub-sequences and machine learning. ACM Transactions on Architecture and Code Optimization (TACO), 14(3):1--28, 2017.
[3]
Amir H Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. A survey on compiler autotuning using machine learning. ACM Computing Surveys (CSUR), 51(5):1--42, 2018.
[4]
Andrew Canis, Stephen D Brown, and Jason H Anderson. Modulo sdc scheduling with recurrence minimization in high-level synthesis. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pages 1--8. IEEE, 2014.
[5]
Vito Giovanni Castellana, Antonino Tumeo, and Fabrizio Ferrandi. High-level synthesis of memory bound and irregular parallel applications with Bambu. In 2014 IEEE Hot Chips 26 Symposium (HCS), pages 1--1, Cupertino, CA, Aug 2014. IEEE.
[6]
Catapult High-Level Synthesis, 2023.
[7]
Chong-Yun Chao and Earl Glen Whitehead. On chromatic equivalence of graphs. In Theory and Applications of Graphs: Proceedings, Michigan May 11--15, 1976, pages 121--131. Springer, 1978.
[8]
Circuit IR Compilers and Tools, 2023.
[9]
Richard Cole. Parallel merge sort. SIAM Journal on Computing, 17(4):770--785, 1988.
[10]
Samuel Coward, George A. Constantinides, and Theo Drane. Automatic Datapath Optimization using E-Graphs. In 2022 IEEE 29th Symposium on Computer Arithmetic (ARITH), pages 43--50, 2022.
[11]
Samuel Coward, George A Constantinides, and Theo Drane. Automating constraint-aware datapath optimization using e-graphs. arXiv preprint arXiv:2303.01839, 2023.
[12]
Samuel Coward, George A Constantinides, and Theo Drane. Combining e-graphs with abstract interpretation. In Proceedings of the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis, pages 1--7, 2023.
[13]
Samuel Coward, Emiliano Morini, Bryan Tan, Theo Drane, and George Constantinides. Datapath verification via word-level e-graph rewriting. arXiv preprint arXiv:2308.00431, 2023.
[14]
Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S Meredith, Philip C Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S Vetter. The scalable heterogeneous computing (shoc) benchmark suite. In Proceedings of the 3rd workshop on general-purpose computation on graphics processing units, pages 63--74, 2010.
[15]
Leonardo De Moura and Nikolaj Bjørner. Efficient e-matching for smt solvers. In Automated Deduction-CADE-21: 21st International Conference on Automated Deduction Bremen, Germany, July 17-20, 2007 Proceedings 21, pages 183--198. Springer, 2007.
[16]
Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT Solver. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 4963 LNCS, 2008.
[17]
Lorenzo Ferretti, Jihye Kwon, Giovanni Ansaloni, Giuseppe Di Guglielmo, Luca P Carloni, and Laura Pozzi. Leveraging prior knowledge for effective design-space exploration in high-level synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(11):3736--3747, 2020.
[18]
O. Flatt, S. Coward, M. Willsey, Z. Tatlock, and P. Panchekha. Small Proofs from Congruence Closure. In Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design, FMCAD 2022, 2022.
[19]
John Forrest and Robin Lougee-Heimer. Cbc user guide. In Emerging theory, methods, and applications, pages 257--277. INFORMS, 2005.
[20]
Yuko Hara-Azumi, Toshinobu Matsuba, Hiroyuki Tomiyama, Shinya Honda, and Hiroaki Takada. Selective resource sharing with rt-level retiming for clock enhancement in high-level synthesis. In 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems, pages 1534--1540. IEEE, 2012.
[21]
Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, and John Wawrzynek. Autophase: Compiler phase-ordering for hls with deep reinforcement learning. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 308--308. IEEE, 2019.
[22]
Intel HLS Compiler, 2023.
[23]
Lana Josipović, Radhika Ghosal, and Paolo Ienne. Dynamically Scheduled High-level Synthesis. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '18, pages 127--136, Monterey, CA, 2018. ACM.
[24]
Knuth-Morris-Pratt algorithm, 2023.
[25]
Thomas Koehler, Phil Trinder, and Michel Steuwer. Sketch-Guided Equality Saturation: Scaling Equality Saturation to Complex Optimizations of Functional Programs. 11 2021.
[26]
Vyas Krishnan and Srinivas Katkoori. A genetic algorithm for the design space exploration of datapaths during high-level synthesis. IEEE Transactions on Evolutionary Computation, 10(3):213--229, 2006.
[27]
PN Krishnapriya and B Bala Tripura Sundari. High level synthesis for retiming stochastic vlsi signal processing architectures. Procedia computer science, 143:10--19, 2018.
[28]
Prasad A Kulkarni, David B Whalley, Gary S Tyson, and Jack W Davidson. Practical exhaustive optimization phase order exploration and evaluation. ACM Transactions on Architecture and Code Optimization (TACO), 6(1):1--36, 2009.
[29]
Sameer Kulkarni and John Cavazos. Mitigating the compiler optimization phase-ordering problem using machine learning. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, pages 147--162, 2012.
[30]
Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. Heterocl: A multi-paradigm programming infrastructure for software-defined reconfigurable computing. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA '19, page 242--251, New York, NY, USA, 2019. Association for Computing Machinery.
[31]
Monica D. Lam, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. SIGPLAN Not., 26(4):63--74, apr 1991.
[32]
Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In International symposium on code generation and optimization, 2004. CGO 2004., pages 75--86. IEEE, 2004.
[33]
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. Mlir: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 2--14. IEEE, 2021.
[34]
Bruce W Leverett, Roderic Geoffrey Galton Cattell, Steven O Hobbs, Joseph M Newcomer, Andrew Henry Reiner, Bruce R Schatz, and William A Wulf. An overview of the production quality compiler-compiler project. Computer, 13(8):38--49, 1980.
[35]
Hung-Yi Liu and Luca P Carloni. On learning-based methods for design-space exploration with high-level synthesis. In Proceedings of the 50th annual design automation conference, pages 1--7, 2013.
[36]
MLIR EmitC, 2023.
[37]
William S Moses, Lorenzo Chelini, Ruizhe Zhao, and Oleksandr Zinenko. Polygeist: Raising C to polyhedral MLIR. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 45--59. IEEE, 2021.
[38]
Chandrakana Nandi, Max Willsey, Amy Zhu, Yisu Remy Wang, Brett Saiki, Adam Anderson, Adriana Schulz, Dan Grossman, and Zachary Tatlock. Rewrite rule inference using equality saturation. Proceedings of the ACM on Programming Languages, 5(OOPSLA):1--28, 2021.
[39]
Charles Gregory Nelson. Techniques for program verification. PhD thesis, Stanford University, 1980.
[40]
Walter Lau Neto, Yingjie Li, Pierre-Emmanuel Gaillardon, and Cunxi Yu. Flowtune: End-to-end automatic logic optimization exploration via domain-specific multi-armed bandit. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
[41]
Ricardo Nobre, Luiz GA Martins, and João MP Cardoso. A graph-based iterative compiler pass selection and phase ordering approach. ACM SIGPLAN Notices, 51(5):21--30, 2016.
[42]
Pavel Panchekha, Alex Sanchez-Stern, James R Wilcox, and Zachary Tatlock. Automatically improving accuracy for floating point expressions. ACM SIGPLAN Notices, 50(6):1--11, 2015.
[43]
Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks. MachSuite: Benchmarks for accelerator design and customized architectures. In Proceedings of the IEEE International Symposium on Workload Characterization, Raleigh, North Carolina, October 2014.
[44]
Benjamin Carrion Schafer and Zi Wang. High-level synthesis design space exploration: Past, present, and future. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 39(10):2628--2639, 2019.
[45]
Gus Henry Smith, Andrew Liu, Steven Lyubomirsky, Scott Davidson, Joseph McMahan, Michael Taylor, Luis Ceze, and Zachary Tatlock. Pure tensor program rewriting via access patterns (representation pearl). In Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming, pages 21--31, 2021.
[46]
Guy Steele. Common LISP: the language. Elsevier, 1990.
[47]
Michael Stepp, Ross Tate, and Sorin Lerner. Equality-based translation validator for LLVM. In Proceedings of the 23rd international conference on Computer Aided Verification, pages 737--742, Berlin, Heidelberg, 2011. Springer-Verlag.
[48]
Stratus High-Level Synthesis, 2023.
[49]
Synopsys HECTOR, 2023.
[50]
Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. Equality saturation: a new approach to optimization. In Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 264--276, 2009.
[51]
Ecenur Ustun, Ismail San, Jiaqi Yin, Cunxi Yu, and Zhiru Zhang. Impress: Large integer multiplication expression rewriting for fpga hls. In 2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 1--10. IEEE, 2022.
[52]
Ecenur Ustun, Cunxi Yu, and Zhiru Zhang. Equality Saturation for Datapath Synthesis: A Pathway to Pareto Optimality.
[53]
Alexa VanHattum, Rachit Nigam, Vincent T Lee, James Bornholt, and Adrian Sampson. Vectorization for digital signal processors via equality saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 874--886, 2021.
[54]
Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, and Dan Suciu. SPORES: Sum-product optimization via relational equality saturation for large scale linear algebra. Proceedings of the VLDB Endowment, 13(11), 2020.
[55]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. Egg: Fast and extensible equality saturation. Proceedings of the ACM on Programming Languages, 5(POPL):1--29, 2021.
[56]
Xilinx Vitis HLS, 2023.
[57]
Ruifan Xu, Youwei Xiao, Jin Luo, and Yun Liang. HECTOR: A MultiLevel Intermediate Representation for Hardware Synthesis Methodologies. In Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, ICCAD '22, New York, NY, USA, 2022. Association for Computing Machinery.
[58]
Yichen Yang, Phitchaya Phothilimthana, Yisu Wang, Max Willsey, Sudip Roy, and Jacques Pienaar. Equality saturation for tensor graph superoptimization. Proceedings of Machine Learning and Systems, 3:255--268, 2021.
[59]
Hanchen Ye, Cong Hao, Jianyi Cheng, Hyunmin Jeong, Jack Huang, Stephen Neuendorffer, and Deming Chen. ScaleHLS: A New Scalable High-Level Synthesis Framework on Multi-Level Intermediate Representation. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 741--755, 2022.
[60]
Qian Zhang, Jiyuan Wang, Guoqing Harry Xu, and Miryung Kim. Heterogen: transpiling c to heterogeneous hls code with automated test generation and program repair. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1017--1029, 2022.
[61]
Zhiru Zhang and Bin Liu. Sdc-based modulo scheduling for pipeline synthesis. In 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 211--218. IEEE, 2013.
[62]
Ruizhe Zhao, Jianyi Cheng, Wayne Luk, and George A. Constantinides. POLSCA: Polyhedral High-Level Synthesis with Compiler Transformations. In 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), pages 235--242, 2022.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
April 2024
1299 pages
ISBN:9798400703850
DOI:10.1145/3620665
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 April 2024

Check for updates

Qualifiers

  • Research-article

Conference

ASPLOS '24

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 744
    Total Downloads
  • Downloads (Last 12 months)744
  • Downloads (Last 6 weeks)99
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media