research-article

Exploring the capabilities of mobile devices in supporting deep learning

Authors:

Saman Biookaghazadeh,

Ming ZhaoAuthors Info & Claims

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

Pages 127 - 138

https://doi.org/10.1145/3318216.3363316

Published: 07 November 2019 Publication History

Abstract

Deep neural networks (DNNs) have unleashed a new wave of applications on mobile devices, such as various intelligent personal assistants. Most of these applications rely on the use of cloud resources to perform deep learning. With increasingly more powerful mobile devices, users can perform more deep learning tasks on the devices. In addition, learning on the devices has important advantages, such as personalization, privacy, and responsiveness; however, a good understanding of the capabilities of modern mobile devices in supporting deep learning is generally lacking. To address this gap in knowledge, this paper presents a comprehensive study on performing training and inference on mobile devices. It develops TensorFlow+, an extension of the widely used TensorFlow framework, to enable training DNNs on devices and use the available GPUs to accelerate the learning tasks. The study focuses on four aspects: 1) the performance impact of the network architecture; 2) the effectiveness of using accelerators for learning on mobile devices; 3) the resource and battery usages of training and inference; and 4) the performance impact on other applications running on the devices. The results show that the size (width and depth) of a network as well as the types of layers that it uses are important to not only meeting the device's capability but also to the performance of learning. The study also shows that hardware acceleration is important to both improving the speed of learning and reducing the impact on other applications on the device.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265--283.

[2]

Moustafa Alzantot, Yingnan Wang, Zhengshuang Ren, and Mani B Srivastava. 2017. Rstensorflow: GPU enabled tensorflow for deep learning on commodity android devices. In Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications. ACM, 7--12.

Digital Library

[3]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Proceedings of Advances in neural information processing systems. 2654--2662.

[4]

Stuart K Card, George G Robertson, and Jock D Mackinlay. 1991. The information visualizer, an information workspace. In Proceedings of the SIGCHI Conference on Human factors in computing systems. ACM, 181--186.

Digital Library

[5]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578--594.

[6]

Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In Proceedings of International Conference on Machine Learning. 2285--2294.

[7]

Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of International conference on machine learning. 647--655.

[8]

Dumpsys. 2015. Dumpsys - A tool to provide information about system services on Android devices. https://developer.android.com/studio/command-line/dumpsys.

[9]

Eigen. 2017. Eigen Library. http://eigen.tuxfamily.org.

[10]

Facebook. 2017. Cafe2 - A new lightweight, modular, and scalable deep learning framework. https://caffe2.ai/.

[11]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of Advances in neural information processing systems. 2672--2680.

[12]

Google. 2013. RenderScript Overview. https://developer.android.com/guide/topics/renderscript/compute.

[13]

Google. 2017. Introduction to TensorFlow Lite. https://www.tensorflow.org/mobile/tflite.

[14]

Google. 2017. Pixel Visual Core (PVC) - Google. https://en.wikichip.org/wiki/google/pixel_visual_core. IPU.

[15]

Google. 2017. TensorFlow Mobile. https://www.tensorflow.org/mobile/android_build.

[16]

Google. 2019. On-Device Training with TensorFlow Lite. https://github.com/tensorflow/community/pull/124.

[17]

Hervé Guihot. 2012. RenderScript. In Pro Android Apps Performance Optimization. Springer, 231--263.

[18]

Song Han, HuiziMao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[19]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[20]

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[21]

Gao Huang, Shichen Liu, Laurens Van der Maaten, and Kilian Q Weinberger. 2018. Condensenet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2752--2761.

[22]

Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. 2018. Gist: Efficient data encoding for deep neural network training. In Proceedings of 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 776--789.

Digital Library

[23]

Deepak Kadetotad, Sairam Arunachalam, Chaitali Chakrabarti, and Jae-sun Seo. 2016. Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications. In Proceedings of Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on. IEEE, 1--8.

Digital Library

[24]

Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: collaborative intelligence between the cloud and mobile edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 615--629.

Digital Library

[25]

Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).

[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of Advances in neural information processing systems. 1097--1105.

[27]

ASU VISA Research Lab. 2019. TensorFlow+: A GPU accelerated deep learning framework for on-device training. https://github.com/ychen404/TensorFlowPlus.

[28]

Nicholas D Lane and Petko Georgiev. 2015. Can deep learning revolutionize mobile sensing?. In Proceedings of the 16th International Workshop on Mobile Computing Systems and Applications. ACM, 117--122.

Digital Library

[29]

Yann LeCun, John S Denker, and Sara A Solla. 1990. Optimal brain damage. In Proceedings of Advances in neural information processing systems. 598--605.

[30]

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV). 19--34.

Digital Library

[31]

Robert B Miller. 1968. Response time in man-computer conversational transactions. In Proceedings of AFIPS Fall Joint Computing Conference (1). 267--277.

Digital Library

[32]

Brad A Myers. 1985. The importance of percent-done progress indicators for computer-human interfaces. In ACM SIGCHI Bulletin, Vol. 16. ACM, 11--17.

Digital Library

[33]

PassMark. 2015. PassMark Software ? Performance Test System Benchmarks. http://www.passmark.com/baselines/index.php. PassMark.

[34]

Qualcomm. 2014. Trepn profiler. developer.qualcomm.com. Qualcomm Inc. Trepn profiler.

[35]

Qualcomm. 2017. Adreno. https://en.wikipedia.org/wiki/Adreno. Adreno.

[36]

S Rallapalli, H Qiu, A Bency, S Karthikeyan, R Govindan, B Manjunath, and R Urgaonkar. 2016. Are very deep neural networks feasible on mobile devices. IEEE Trans. Circ. Syst. Video Technol (2016).

[37]

Dennis M Ritchie, Brian W Kernighan, and Michael E Lesk. 1988. The C programming language. Prentice Hall Englewood Cliffs.

[38]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. FitNets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).

[39]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[40]

Ragini Sharma, Saman Biookaghazadeh, Baoxin Li, and Ming Zhao. 2018. Are existing knowledge transfer techniques effective for deep learning with edge devices?. In Proceedings of 2018 IEEE International Conference on Edge Computing (EDGE). IEEE, 42--49.

[41]

Laurent Sifre and PS Mallat. 2014. Rigid-motion scattering for image classification. Ph.D. Dissertation.

[42]

Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. 2017. Federated multi-task learning. In Proceedings of Advances in Neural Information Processing Systems. 4424--4434.

[43]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

Digital Library

[44]

Ragav Venkatesan and Baoxin Li. 2016. Diving deeper into mentee networks. arXiv preprint arXiv:1604.08220 (2016).

[45]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).

[46]

Chaoyun Zhang, Paul Patras, and Hamed Haddadi. 2018. Deep Learning in Mobile and Wireless Networking: A Survey. arXiv preprint arXiv:1803.04311 (2018).

Cited By

Li QGao ZSun YWang YWang RZhu H(2024)An Efficient Asynchronous Federated Learning Protocol for Edge DevicesIEEE Internet of Things Journal10.1109/JIOT.2024.340663411:17(28798-28808)Online publication date: 1-Sep-2024
https://doi.org/10.1109/JIOT.2024.3406634
Raha AGain MDebnath RAdhikary AQiao YHassan MBairagi AIslam S(2024)Attention to Monkeypox: An Interpretable Monkeypox Detection Technique Using Attention MechanismIEEE Access10.1109/ACCESS.2024.338509912(51942-51965)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3385099
Chen RWan QZhang XQin XHou YWang DFu XPan MHui PAmiri Sani ANurmi PLiu Y(2023)EEFL: High-Speed Wireless Communications Inspired Energy Efficient Federated Learning over Mobile DevicesProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596865(544-556)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3581791.3596865
Show More Cited By

Index Terms

Exploring the capabilities of mobile devices in supporting deep learning
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Neural networks
  2. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software

Recommendations

Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions

Recent years have witnessed an exponential increase in the use of mobile and embedded devices. With the great success of deep learning in many fields, there is an emerging trend to deploy deep learning on mobile and embedded devices to better meet the ...
Exploring the Capabilities of Mobile Devices Supporting Deep Learning
HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing

With the increasingly more powerful mobile devices, it becomes possible to perform more deep learning tasks on the devices, and there are also important advantages of learning on devices, such as personalization and efficiency. However, a good ...
A New Deep Learning-Based Handwritten Character Recognition System on Mobile Computing Devices
Abstract
Deep learning (DL) is a hot topic in current pattern recognition and machine learning. DL has unprecedented potential to solve many complex machine learning problems and is clearly attractive in the framework of mobile devices. The availability of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing

November 2019

455 pages

ISBN:9781450367332

DOI:10.1145/3318216

General Chairs:
Songqing Chen
George Mason University
,
Ryokichi Onishi
Toyota
,
Program Chairs:
Ganesh Ananthanarayanan
Microsoft Research
,
Qun Li
College of William & Mary

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

In-Cooperation

IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SEC '19

Sponsor:

SIGMOBILE

SEC '19: The Fourth ACM/IEEE Symposium on Edge Computing

November 7 - 9, 2019

Virginia, Arlington

Acceptance Rates

SEC '19 Paper Acceptance Rate 20 of 59 submissions, 34%;

Overall Acceptance Rate 40 of 100 submissions, 40%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
702
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)9

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li QGao ZSun YWang YWang RZhu H(2024)An Efficient Asynchronous Federated Learning Protocol for Edge DevicesIEEE Internet of Things Journal10.1109/JIOT.2024.340663411:17(28798-28808)Online publication date: 1-Sep-2024
https://doi.org/10.1109/JIOT.2024.3406634
Raha AGain MDebnath RAdhikary AQiao YHassan MBairagi AIslam S(2024)Attention to Monkeypox: An Interpretable Monkeypox Detection Technique Using Attention MechanismIEEE Access10.1109/ACCESS.2024.338509912(51942-51965)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3385099
Chen RWan QZhang XQin XHou YWang DFu XPan MHui PAmiri Sani ANurmi PLiu Y(2023)EEFL: High-Speed Wireless Communications Inspired Energy Efficient Federated Learning over Mobile DevicesProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596865(544-556)Online publication date: 18-Jun-2023
https://dl.acm.org/doi/10.1145/3581791.3596865
Shuvo MIslam SCheng JMorshed B(2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
https://doi.org/10.1109/JPROC.2022.3226481
Zhou RYu JWang RLi BJiang JWu L(2023)A Reinforcement Learning Approach for Minimizing Job Completion Time in Clustered Federated LearningIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228925(1-10)Online publication date: 17-May-2023
https://doi.org/10.1109/INFOCOM53939.2023.10228925
Chang TLi LWu MYu WWang XXu C(2023)PAGroup: Privacy-aware grouping framework for high-performance federated learningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.12.011175(37-50)Online publication date: May-2023
https://doi.org/10.1016/j.jpdc.2022.12.011
Huang ZHuang WWu HFang W(2023)TongueMobile: automated tongue segmentation and diagnosis on smartphonesNeural Computing and Applications10.1007/s00521-023-08902-535:28(21259-21274)Online publication date: 4-Aug-2023
https://doi.org/10.1007/s00521-023-08902-5
Heidari SGhasemi MKim YWu CVrudhula S(2022)CAMDNN: Content-Aware Mapping of a Network of Deep Neural Networks on Edge MPSoCsIEEE Transactions on Computers10.1109/TC.2022.3207137(1-12)Online publication date: 2022
https://doi.org/10.1109/TC.2022.3207137
Kim YWu C(2022)FedGPO: Heterogeneity-Aware Global Parameter optimization for Efficient Federated Learning2022 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC55918.2022.00020(117-129)Online publication date: Nov-2022
https://doi.org/10.1109/IISWC55918.2022.00020
Schüle MLang HSpringer MKemper ANeumann TGünnemann S(2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 9-Jul-2022
https://doi.org/10.1007/s10619-022-07417-7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten