skip to main content
10.1145/3587135.3592208acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support

Published: 04 August 2023 Publication History

Abstract

The growing demand for deep learning applications has led to the design and development of several hardware accelerators to increase performance and energy efficiency. In particular, convolutional accelerators are among those receiving the most attention due to their applicability in many fields. Another aspect that is gaining increasing attention is the use of a shared virtual address space between processor and accelerators. It can provide several advantages such as programmability and security. The use of a shared address space relies on a time-consuming IOMMU to satisfy address translation requests. In this work, we analyze convolutional workloads in convolutional accelerators, identifying the sensitivity of performance to IOMMU activity. Additionally, based on the analysis done on convolutional workloads, we propose the use of dedicated accelerator registers (Translation Registers) to reduce costly IOMMU accesses. Translation Registers allow reducing execution time by about 20% and the energy consumption related to address translation up to about 55%.

References

[1]
Advanced Micro Devices, Inc. 2022. AMD I/O Virtualization Technology (IOMMU) Specification. (2022).
[2]
ARM Ltd. 2021. ARM System Memory Management Unit Architecture Specification. (2021).
[3]
Thomas Barr et al. 2010. Translation caching: skip, don't walk (the page table). ACM SIGARCH Computer Architecture News (2010).
[4]
Nathan Binkert et al. 2011. The gem5 simulator. ACM SIGARCH computer architecture news 39, 2 (2011), 1--7.
[5]
Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. DianNao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (2016), 105--112.
[6]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits 52, 1 (2016), 127--138.
[7]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits 52, 1 (2016), 127--138.
[8]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 (2019), 292--308.
[9]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 (2019), 292--308.
[10]
Hsueh-Chun Fu et al. 2018. Active forwarding: Eliminate IOMMU address translation for accelerator-rich architectures. In Proceedings of the 55th DAC.
[11]
Yuchen Hao, Zhenman Fang, Glenn Reinman, and Jason Cong. 2017. Supporting address translation for accelerator-centric architectures. In IEEE International Symposium on High Performance Computer Architecture (HPCA).
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[13]
HSA Foundation. 2017. HSA Foundation. (2017).
[14]
Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, and Minsoo Rhu. 2020. NeuMMU: Architectural support for efficient address translations in neural processing units. In Proceedings of the 25th ASPLOS.
[15]
Intel Corp. 2014. Intel Virtualization Technology for Directed I/O. (2014).
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84--90.
[17]
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 553--564. https://doi.org/10.1109/HPCA.2017.29
[18]
Chang Park et al. 2022. Every walk's a hit: making page walks single-access cache hits. In Proceedings of the 27th ACM ASPLOS.
[19]
Biagio Peccerillo, Mirco Mannino, Andrea Mondelli, and Sandro Bartolini. 2022. A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives. Journal of Systems Architecture (2022). https://doi.org/10.1016/j.sysarc.2022.102561
[20]
Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 137--151.
[21]
James L Peterson and Abraham Silberschatz. 1985. Operating system concepts. Addison-Wesley Longman Publishing Co., Inc.
[22]
RISC-V IOMMU Task Group. 2023. RISCV-V IOMMU Architecture Specification. (2023).
[23]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture 15, 2 (2020), 1--341.
[24]
Andrew Waterman, Krste Asanovic, Hauser John, and SiFive Inc. 2021. The RISC-V Instruction Set Manual Volume II: Privileged Architecture. (December 2021).
[25]
Steven JE Wilton and Norman P Jouppi. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of solid-state circuits 31, 5 (1996), 677--688.
[26]
Jiyuan Zhang, Franz Franchetti, and Tze Meng Low. 2018. High performance zero-memory overhead direct convolutions. In International Conference on Machine Learning. PMLR, 5776--5785.

Cited By

View all
  • (2024)Integration of RISC-V Page Table Walk in gem5 SE ModeProceedings of the 16th Workshop on Rapid Simulation and Performance Evaluation for Design10.1145/3642921.3642926(22-28)Online publication date: 18-Jan-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers
May 2023
419 pages
ISBN:9798400701405
DOI:10.1145/3587135
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Convolutional Neural Network
  2. Deep Learning
  3. Hardware Accelerator
  4. IOMMU
  5. Virtual Memory
  6. Virtual Shared Address Space

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CF '23
Sponsor:

Acceptance Rates

CF '23 Paper Acceptance Rate 24 of 66 submissions, 36%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)6
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Integration of RISC-V Page Table Walk in gem5 SE ModeProceedings of the 16th Workshop on Rapid Simulation and Performance Evaluation for Design10.1145/3642921.3642926(22-28)Online publication date: 18-Jan-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media