skip to main content
10.1145/3582016.3582070acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

Published: 25 March 2023 Publication History

Abstract

The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy-efficiency of an application running on the CGRA. This paper presents APEX, an automated approach for generating specialized PE architectures for an application or an application domain. APEX first analyzes application domain benchmarks using frequent subgraph mining to extract commonly occurring computational subgraphs. APEX then generates specialized PEs by merging subgraphs using a datapath graph merging algorithm. The merged datapath graphs are translated into a PE specification from which we automatically generate the PE hardware description in Verilog along with a compiler that maps applications to the PE. The PE hardware and compiler are inserted into a flexible CGRA generation and compilation toolchain that allows for agile evaluation of CGRAs. We evaluate APEX for two domains, machine learning and image processing. For image processing applications, our automatically generated CGRAs with specialized PEs achieve from 5% to 30% less area and from 22% to 46% less energy compared to a general-purpose CGRA. For machine learning applications, our automatically generated CGRAs consume 16% to 59% less energy and 22% to 39% less area than a general-purpose CGRA. This work paves the way for creation of application domain-driven design-space exploration frameworks that automatically generate efficient programmable accelerators, with a much lower design effort for both hardware and compiler generation.

References

[1]
Giovanni Ansaloni, Paolo Bonzini, and Laura Pozzi. 2008. Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays. In 2008 Symposium on Application Specific Processors. https://doi.org/10.1109/SASP.2008.4570782
[2]
Kubilay Atasu, Laura Pozzi, and Paolo Ienne. 2003. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. In Proceedings of the 40th Annual Design Automation Conference (DAC ’03). Association for Computing Machinery, New York, NY, USA. 256–261. isbn:1581136889 https://doi.org/10.1145/775832.775897
[3]
Rick Bahr, Clark Barrett, Nikhil Bhagdikar, Alex Carsello, Ross Daly, Caleb Donovick, David Durst, Kayvon Fatahalian, Kathleen Feng, Pat Hanrahan, Teguh Hofstee, Mark Horowitz, Dillon Huff, Fredrik Kjolstad, Taeyoung Kong, Qiaoyi Liu, Makai Mann, Jackson Melchert, Ankita Nayak, Aina Niemetz, Gedeon Nyengele, Priyanka Raina, Stephen Richardson, Raj Setaluri, Jeff Setter, Kavya Sreedhar, Maxwell Strange, James Thomas, Christopher Torng, Leonard Truong, Nestan Tsiskaridze, and Keyi Zhang. 2020. Creating an Agile Hardware Design Flow. In 2020 57th ACM/IEEE Design Automation Conference (DAC). https://doi.org/10.1109/DAC18072.2020.9218553
[4]
Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, and Li-Shiuan Peh. 2022. REVAMP: A Systematic Framework for Heterogeneous CGRA Realization. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA. isbn:9781450392051 https://doi.org/10.1145/3503222.3507772
[5]
Clark Barrett and Cesare Tinelli. 2018. Satisfiability Modulo Theories. In Handbook of Model Checking, Edmund M. Clarke, Thomas A. Henzinger, Helmut Veith, and Roderick Bloem (Eds.). Springer International Publishing. isbn:978-3-319-10575-8 https://doi.org/10.1007/978-3-319-10575-8_11
[6]
Eli Bendersky. 2013. A Deeper Look into the LLVM Code Generator, Part 1. https://eli.thegreenplace.net/2013/02/25/a-deeper-look-into-the-llvm-code-generator-part-1
[7]
Robert Brummayer, Armin Biere, and Florian Lonsing. 2008. BTOR: Bit-Precise Modelling of Word-Level Problems for Model Checking. In Proceedings of the Joint Workshops of the 6th International Workshop on Satisfiability Modulo Theories and 1st International Workshop on Bit-Precise Reasoning (SMT ’08/BPR ’08). Association for Computing Machinery, New York, NY, USA. isbn:9781605584409 https://doi.org/10.1145/1512464.1512472
[8]
Pierre-Yves Calland, Anne Mignotte, Olivier Peyran, Yves Robert, and Frédéric Vivien. 1998. Retiming DAGs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/43.736571
[9]
Hong Cheng, Xifeng Yan, and Jiawei Han. 2010. Mining Graph Patterns. Springer US, Boston, MA. isbn:978-1-4419-6045-0 https://doi.org/10.1007/978-1-4419-6045-0_12
[10]
Jason Cong, Yiping Fan, Guoling Han, and Zhiru Zhang. 2004. Application-Specific Instruction Generation for Configurable Processor Architectures. In Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA ’04). Association for Computing Machinery, New York, NY, USA. isbn:1581138296 https://doi.org/10.1145/968280.968307
[11]
Ross Daly, Caleb Donovick, Jackson Melchert, Rajsekhar Setaluri, Nestan Tsiskaridze Bullock, Priyanka Raina, Clark Barrett, and Pat Hanrahan. 2022. Synthesizing Instruction Selection Rewrite Rules from RTL using SMT. In Conference on Formal Methods in Computer-Aided Design (FMCAD). 139–150. https://doi.org/10.34727/2022/isbn.978-3-85448-053-2_20
[12]
Ross Daly, Leonard Truong, and Pat Hanrahan. 2018. Invoking and Linking Generators from Multiple Hardware Languages using CoreIR. In Workshop on Open-Source EDA Technology (WOSET). https://woset-workshop.github.io/PDFs/2018/a11.pdf
[13]
Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow., issn:2150-8097 https://doi.org/10.14778/2732286.2732289
[14]
Robert. B. Hitchcock, Gordon L. Smith, and David D. Cheng. 1982. Timing Analysis of Computer Hardware. IBM Journal of Research and Development, https://doi.org/10.1147/rd.261.0100
[15]
Dillon Huff, Steve Dai, and Pat Hanrahan. 2021. Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’21). Association for Computing Machinery, New York, NY, USA. isbn:9781450382182 https://doi.org/10.1145/3431920.3439457
[16]
Kalhan Koul, Jackson Melchert, Kavya Sreedhar, Leonard Truong, Gedeon Nyengele, Keyi Zhang, Qiaoyi Liu, Jeff Setter, Po-Han Chen, Yuchen Mei, Maxwell Strange, Ross Daly, Caleb Donovick, Alex Carsello, Taeyoung Kong, Kathleen Feng, Dillon Huff, Ankita Nayak, Rajsekhar Setaluri, James Thomas, Nikhil Bhagdikar, David Durst, Zachary Myers, Nestan Tsiskaridze, Stephen Richardson, Rick Bahr, Kayvon Fatahalian, Pat Hanrahan, Clark Barrett, Mark Horowitz, Christopher Torng, Fredrik Kjolstad, and Priyanka Raina. 2022. AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and Compilers. ACM Transactions on Embedded Computing Systems (TECS), July, issn:1539-9087 https://doi.org/10.1145/3534933
[17]
Sihao Liu, Jian Weng, Dylan Kupsh, Atefeh Sohrabizadeh, Zhengrong Wang, Licheng Guo, Jiuyang Liu, Maxim Zhulin, Rishabh Mani, Lucheng Zhang, Jason Cong, and Tony Nowatzki. 2022. OverGen: Improving FPGA Usability through Domain-specific Overlay Generation. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO56248.2022.00018
[18]
Nahri Moreano, Edson Borin, Cid C. de Souza, and Guido Araujo. 2005. Efficient datapath merging for partially reconfigurable architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2005.850844
[19]
Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture for Parallel Patterns. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1145/3079856.3080256
[20]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not., issn:0362-1340 https://doi.org/10.1145/2499370.2462176
[21]
Edward Rosten and Tom Drummond. 2006. Machine Learning for High-Speed Corner Detection. In Computer Vision – ECCV 2006, Aleš Leonardis, Horst Bischof, and Axel Pinz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. isbn:978-3-540-33833-8 https://doi.org/10.1007/11744023_34
[22]
Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel Emer, C. Thomas Gray, Brucek Khailany, and Stephen W. Keckler. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In MICRO. isbn:9781450369381 https://doi.org/10.1145/3352460.3358302
[23]
Cheng Tan, Chenhao Xie, Ang Li, Kevin J. Barker, and Antonino Tumeo. 2021. AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). https://doi.org/10.23919/DATE51398.2021.9473955
[24]
Russell Tessier, Kenneth Pocek, and André DeHon. 2015. Reconfigurable Computing Architectures. Proc. IEEE, https://doi.org/10.1109/JPROC.2014.2386883
[25]
Lenny Truong and Pat Hanrahan. 2019. A Golden Age of Hardware Description Languages: Applying Programming Language Techniques to Improve Design Productivity. In 3rd Summit on Advances in Programming Languages, SNAPL 2019, May 16-17, 2019, Providence, RI, USA, Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi (Eds.) (LIPIcs). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.SNAPL.2019.7
[26]
Artem Vasilyev, Nikhil Bhagdikar, Ardavan Pedram, Stephen Richardson, Shahar Kvatinsky, and Mark Horowitz. 2016. Evaluating Programmable Architectures for Imaging and Vision Applications. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). https://doi.org/10.1109/MICRO.2016.7783755
[27]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation Cores: Reducing the Energy of Mature Computations. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery, New York, NY, USA. isbn:9781605588391 https://doi.org/10.1145/1736020.1736044
[28]
Ganesh Venkatesh, Jack Sampson, Nathan Goulding-Hotta, Sravanthi Kota Venkata, Michael Bedford Taylor, and Steven Swanson. 2011. QsCores: Trading Dark Silicon for Scalable Energy Efficiency with Quasi-Specific Cores. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). Association for Computing Machinery, New York, NY, USA. isbn:9781450310536 https://doi.org/10.1145/2155620.2155640
[29]
Jian Weng, Sihao Liu, Vidushi Dadu, Zhengrong Wang, Preyas Shah, and Tony Nowatzki. 2020. DSAGEN: Synthesizing Programmable Spatial Accelerators. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). https://doi.org/10.1109/ISCA45697.2020.00032
[30]
Max Willsey, Vincent T. Lee, Alvin Cheung, Rastislav Bodík, and Luis Ceze. 2019. Iterative Search for Reconfigurable Accelerator Blocks With a Compiler in the Loop. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, https://doi.org/10.1109/TCAD.2018.2878194

Cited By

View all
  • (2025)Enhancing CGRA Efficiency Through Aligned Compute and Communication ProvisioningProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707230(410-425)Online publication date: 30-Mar-2025
  • (2024)Intermittent-Aware Design Exploration of Systolic Array Using Various Non-Volatile Memory: A Comparative StudyMicromachines10.3390/mi1503034315:3(343)Online publication date: 29-Feb-2024
  • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 22-Nov-2024
  • Show More Cited By

Index Terms

  1. APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
        March 2023
        820 pages
        ISBN:9781450399180
        DOI:10.1145/3582016
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 March 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. CGRA
        2. design space exploration
        3. domain-specific accelerators
        4. graph analysis
        5. hardware-software co-design
        6. processing elements
        7. reconfigurable accelerators
        8. subgraph

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        ASPLOS '23

        Acceptance Rates

        Overall Acceptance Rate 535 of 2,713 submissions, 20%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)248
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 01 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Enhancing CGRA Efficiency Through Aligned Compute and Communication ProvisioningProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707230(410-425)Online publication date: 30-Mar-2025
        • (2024)Intermittent-Aware Design Exploration of Systolic Array Using Various Non-Volatile Memory: A Comparative StudyMicromachines10.3390/mi1503034315:3(343)Online publication date: 29-Feb-2024
        • (2024)Canalis: A Throughput-Optimized Framework for Real-Time Stream Processing of Wireless CommunicationACM Transactions on Reconfigurable Technology and Systems10.1145/369588017:4(1-32)Online publication date: 22-Nov-2024
        • (2024)ICED: An Integrated CGRA Framework Enabling DVFS-Aware Acceleration2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00099(1338-1352)Online publication date: 2-Nov-2024
        • (2024)Coarse-grained reconfigurable architectures for radio baseband processing: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103243154(103243)Online publication date: Sep-2024
        • (2023)An Open-Source $4 \times 8$ Coarse-Grained Reconfigurable Array Using SkyWater 130 nm Technology and Agile Hardware Design Flow2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10182052(1-5)Online publication date: 21-May-2023

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media