research-article

Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support

Authors:

Biagio Peccerillo,

Andrea Mondelli,

Sandro BartoliniAuthors Info & Claims

CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers

Pages 84 - 90

https://doi.org/10.1145/3587135.3592208

Published: 04 August 2023 Publication History

Abstract

The growing demand for deep learning applications has led to the design and development of several hardware accelerators to increase performance and energy efficiency. In particular, convolutional accelerators are among those receiving the most attention due to their applicability in many fields. Another aspect that is gaining increasing attention is the use of a shared virtual address space between processor and accelerators. It can provide several advantages such as programmability and security. The use of a shared address space relies on a time-consuming IOMMU to satisfy address translation requests. In this work, we analyze convolutional workloads in convolutional accelerators, identifying the sensitivity of performance to IOMMU activity. Additionally, based on the analysis done on convolutional workloads, we propose the use of dedicated accelerator registers (Translation Registers) to reduce costly IOMMU accesses. Translation Registers allow reducing execution time by about 20% and the energy consumption related to address translation up to about 55%.

References

[1]

Advanced Micro Devices, Inc. 2022. AMD I/O Virtualization Technology (IOMMU) Specification. (2022).

[2]

ARM Ltd. 2021. ARM System Memory Management Unit Architecture Specification. (2021).

[3]

Thomas Barr et al. 2010. Translation caching: skip, don't walk (the page table). ACM SIGARCH Computer Architecture News (2010).

[4]

Nathan Binkert et al. 2011. The gem5 simulator. ACM SIGARCH computer architecture news 39, 2 (2011), 1--7.

[5]

Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. DianNao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (2016), 105--112.

Digital Library

[6]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits 52, 1 (2016), 127--138.

Digital Library

[7]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits 52, 1 (2016), 127--138.

Digital Library

[8]

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 (2019), 292--308.

[9]

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 (2019), 292--308.

[10]

Hsueh-Chun Fu et al. 2018. Active forwarding: Eliminate IOMMU address translation for accelerator-rich architectures. In Proceedings of the 55th DAC.

[11]

Yuchen Hao, Zhenman Fang, Glenn Reinman, and Jason Cong. 2017. Supporting address translation for accelerator-centric architectures. In IEEE International Symposium on High Performance Computer Architecture (HPCA).

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[13]

HSA Foundation. 2017. HSA Foundation. (2017).

[14]

Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, and Minsoo Rhu. 2020. NeuMMU: Architectural support for efficient address translations in neural processing units. In Proceedings of the 25th ASPLOS.

Digital Library

[15]

Intel Corp. 2014. Intel Virtualization Technology for Directed I/O. (2014).

[16]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84--90.

Digital Library

[17]

Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 553--564. https://doi.org/10.1109/HPCA.2017.29

[18]

Chang Park et al. 2022. Every walk's a hit: making page walks single-access cache hits. In Proceedings of the 27th ACM ASPLOS.

[19]

Biagio Peccerillo, Mirco Mannino, Andrea Mondelli, and Sandro Bartolini. 2022. A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives. Journal of Systems Architecture (2022). https://doi.org/10.1016/j.sysarc.2022.102561

Digital Library

[20]

Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 137--151.

Digital Library

[21]

James L Peterson and Abraham Silberschatz. 1985. Operating system concepts. Addison-Wesley Longman Publishing Co., Inc.

[22]

RISC-V IOMMU Task Group. 2023. RISCV-V IOMMU Architecture Specification. (2023).

[23]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2020. Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture 15, 2 (2020), 1--341.

[24]

Andrew Waterman, Krste Asanovic, Hauser John, and SiFive Inc. 2021. The RISC-V Instruction Set Manual Volume II: Privileged Architecture. (December 2021).

[25]

Steven JE Wilton and Norman P Jouppi. 1996. CACTI: An enhanced cache access and cycle time model. IEEE Journal of solid-state circuits 31, 5 (1996), 677--688.

[26]

Jiyuan Zhang, Franz Franchetti, and Tze Meng Low. 2018. High performance zero-memory overhead direct convolutions. In International Conference on Machine Learning. PMLR, 5776--5785.

Cited By

Mannino MHuang YPeccerillo BMedaglini ABartolini S(2024)Integration of RISC-V Page Table Walk in gem5 SE ModeProceedings of the 16th Workshop on Rapid Simulation and Performance Evaluation for Design10.1145/3642921.3642926(22-28)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3642921.3642926

Index Terms

Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support

Recommendations

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained great attentions due to its higher energy efficiency than GPUs. However, it has been a challenge for FPGA-based solutions to achieve a higher throughput than GPU ...
Page Fault Support for Network Controllers
Asplos'17

Direct network I/O allows network controllers (NICs) to expose multiple instances of themselves, to be used by untrusted software without a trusted intermediary. Direct I/O thus frees researchers from legacy software, fueling studies that innovate in ...
Designing efficient accelerator of depthwise separable convolutional neural network on FPGA
Abstract
In recent years, convolutional neural networks (CNNs) have achieved state-of-the-art results for many computer vision tasks. However, the traditional CNNs are computational-intensive and memory-intensive, hence they are unsuitable for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers

May 2023

419 pages

ISBN:9798400701405

DOI:10.1145/3587135

General Chairs:
Andrea Bartolini
Università di Bologna, IT
,
Kristian Rietveld
Leiden University, NL
,
Program Chairs:
Catherine Schuman
University of Tennessee, US
,
Jose Moreira
IBM, US

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CF '23

Sponsor:

SIGMICRO

CF '23: 20th ACM International Conference on Computing Frontiers

May 9 - 11, 2023

Bologna, Italy

Acceptance Rates

CF '23 Paper Acceptance Rate 24 of 66 submissions, 36%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
95
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)6

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mannino MHuang YPeccerillo BMedaglini ABartolini S(2024)Integration of RISC-V Page Table Walk in gem5 SE ModeProceedings of the 16th Workshop on Rapid Simulation and Performance Evaluation for Design10.1145/3642921.3642926(22-28)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3642921.3642926

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents