research-article

Towards efficient quantized neural network inference on mobile devices: work-in-progress

Authors:
Yaman Umuroglu

Norwegian University of Science and Technology, Trondheim, Norway

Norwegian University of Science and Technology, Trondheim, Norway
View Profile

,
Magnus Jahre

Norwegian University of Science and Technology, Trondheim, Norway

Norwegian University of Science and Technology, Trondheim, Norway
View Profile

CASES '17: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems CompanionOctober 2017Article No.: 18Pages 1–2https://doi.org/10.1145/3125501.3125528

Published:15 October 2017Publication History

CASES '17: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion

Pages 1–2

ABSTRACT

From voice recognition to object detection, Deep Neural Networks (DNNs) are steadily getting better at extracting information from complex raw data. Combined with the popularity of mobile computing and the rise of the Internet-of-Things (IoT), there is enormous potential for widespread deployment of intelligent devices, but a computational challenge remains. A modern DNN can require billions of floating point operations to classify a single image, which is far too costly for energy-constrained mobile devices. Offloading DNNs to powerful servers in the cloud is only a limited solution, as it requires significant energy for data transfer and cannot address applications with low-latency requirements such as augmented reality or navigation for autonomous drones.

References

Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. 2017. Deep Learning with Low Precision by Half-wave Gaussian Quantization. In CVPR.Google Scholar
Benoit Jacob et al. 2017. gemmlowp: a small self-contained low-precision GEMM library. (2017). https://github.com/google/gemmlowp.Google Scholar
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain.Google ScholarDigital Library
Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM. Google ScholarDigital Library

Recommendations

Quantized deep neural networks for energy efficient hardware-based inference
ASPDAC '18: Proceedings of the 23rd Asia and South Pacific Design Automation Conference

Deep Neural Networks (DNNs) have been adopted in many systems because of their higher classification accuracy, with custom hardware implementations great candidates for high-speed, accurate inference. While progress in achieving large scale, highly ...
Read More
Quantized deep neural networks for energy efficient hardware-based inference
2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC)
Deep Neural Networks (DNNs) have been adopted in many systems because of their higher classification accuracy, with custom hardware implementations great candidates for high-speed, accurate inference. While progress in achieving large scale, highly ...
Read More
Towards efficient vision transformer inference: a first study of transformers on mobile devices
HotMobile '22: Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications

Convolution neural networks (CNNs) have long been dominating the model choice in on-device intelligent mobile applications. Recently, we are witnessing the fast development of vision transformers, which are notable for the use of the self-attention ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

CASES '17: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion
October 2017
51 pages
ISBN:9781450351843
DOI:10.1145/3125501

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate52of230submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 209
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards efficient quantized neural network inference on mobile devices: work-in-progress

CASES '17: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion

ABSTRACT

References

Cited By

Recommendations

Quantized deep neural networks for energy efficient hardware-based inference

Quantized deep neural networks for energy efficient hardware-based inference

Towards efficient vision transformer inference: a first study of transformers on mobile devices

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards efficient quantized neural network inference on mobile devices: work-in-progress

CASES '17: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion

ABSTRACT

References

Cited By

Recommendations

Quantized deep neural networks for energy efficient hardware-based inference

Quantized deep neural networks for energy efficient hardware-based inference

Towards efficient vision transformer inference: a first study of transformers on mobile devices

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media