skip to main content
10.1145/3400286.3418240acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Toward Fast Platform-Aware Neural Architecture Search for FPGA-Accelerated Edge AI Applications

Published: 25 November 2020 Publication History

Abstract

Neural Architecture Search (NAS) is a technique for finding suitable neural network architecture models for given applications. Previously, such search methods are usually based on reinforcement learning, with a recurrent neural network to generate neural network models. However, most NAS methods aim to find a set of candidates with best cost-performance ratios, e.g. high accuracy and low computing time, based on rough estimates derived from the workload generically. As today's deep learning chips accelerate neural network operations with a variety of hardware tricks such as vectors and low-precision data formats, the estimated metrics derived from generic computing operations such as float-point operations (FLOPS) would be very different from the actual latency, throughput, power consumption, etc., which are highly sensitive to the hardware design and even the software optimization in edge AI applications. Thus, instead of taking a long time to pick and train so called good candidates repeatedly based on unreliable estimates, we propose a NAS framework which accelerates the search process by including the actual performance measurements in the search process. The inclusion of actual measurements enables the proposed NAS framework to find candidates based on correct information and reduce the possibility of selecting wrong candidates and wasting search time on wrong candidates. To illustrate the effectiveness of our framework, we prototyped the framework to work with Intel OpenVINO and Field Programmable Gate Arrays (FPGA) to meet the accuracy and latency required by the user. The framework takes the dataset, accuracy and latency requirements from the user and automatically search for candidates to meet the requirements. Case studies and experimental results are presented in this paper to evaluate the effectiveness of our framework for Edge AI applications in real-time image classification.

References

[1]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211--252, 2015.
[2]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE, 105(12):2295--2329, 2017.
[3]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377, 2018.
[4]
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. Automated Machine Learning. Springer, 2019.
[5]
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren, editors. Automatic Machine Learning: Methods, Systems, Challenges. Springer, 2019.
[6]
Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. Towards pervasive and user satisfactory cnn across gpu microarchitectures. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 1--12. IEEE, 2017.
[7]
Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
[8]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697-8710, 2018.
[9]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2820--2828, 2019.
[10]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
[11]
Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
[12]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 2015.
[13]
Weiwen Jiang, Xinyi Zhang, Edwin H-M Sha, Lei Yang, Qingfeng Zhuge, Yiyu Shi, and Jingtong Hu. Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In Proceedings of the 56th Annual Design Automation Conference 2019, pages 1--6, 2019.
[14]
Intel OpenVINO. Intelopenvino, 2019. https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html.
[15]
Terasic. Terasic, 2019. https://www.terasic.com.tw/tw/.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.
[17]
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648, 2016.
[18]
Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. Deep networks with stochastic depth. In European conference on computer vision, pages 646--661. Springer, 2016.
[19]
Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
[20]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700--4708, 2017.
[21]
Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018.
[22]
NVIDIA. Nvidia a100 gpu, 2020. https://www.nvidia.com/en-us/data-center/a100/.
[23]
Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Nissanka Bodhi Priyantha, Jie Liu, and Diana Marculescu. Single-path mobile automl: Efficient convnet design and nas hyperparameter optimization. IEEE Journal of Selected Topics in Signal Processing, 2020.
[24]
Liam Li and Ameet Talwalkar. Random search and reproducibility for neural architecture search. arXiv preprint arXiv:1902.07638, 2019.
[25]
NVIDIA. Nvidia agx xavier, 2018. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/.
[26]
NVIDIA. Nvidia xavier nx, 2020. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-xavier-nx/.
[27]
Google. Edge tpu, 2019. https://coral.ai/products/.
[28]
TWCC. https://www.twcc.ai/intro/HPC.

Cited By

View all
  • (2023)Energy-Efficient Edge Intelligence: A Comparative Analysis of AIoT TechnologiesMobile Networks and Applications10.1007/s11036-023-02122-w29:1(147-155)Online publication date: 17-Mar-2023
  • (2022)AutoTinyML for microcontrollers: Dealing with black-box deployabilityExpert Systems with Applications10.1016/j.eswa.2022.117876207(117876)Online publication date: Nov-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RACS '20: Proceedings of the International Conference on Research in Adaptive and Convergent Systems
October 2020
300 pages
ISBN:9781450380256
DOI:10.1145/3400286
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI
  2. Deep Learning
  3. Edge Computing
  4. FPGA. OpenVINO
  5. GPU
  6. Neural Architecture Search
  7. Performance Evaluation
  8. Reinforcement learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RACS '20
Sponsor:

Acceptance Rates

RACS '20 Paper Acceptance Rate 42 of 148 submissions, 28%;
Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)2
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Energy-Efficient Edge Intelligence: A Comparative Analysis of AIoT TechnologiesMobile Networks and Applications10.1007/s11036-023-02122-w29:1(147-155)Online publication date: 17-Mar-2023
  • (2022)AutoTinyML for microcontrollers: Dealing with black-box deployabilityExpert Systems with Applications10.1016/j.eswa.2022.117876207(117876)Online publication date: Nov-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media