skip to main content
10.1145/3318216.3363316acmconferencesArticle/Chapter ViewAbstractPublication PagessecConference Proceedingsconference-collections
research-article

Exploring the capabilities of mobile devices in supporting deep learning

Published: 07 November 2019 Publication History

Abstract

Deep neural networks (DNNs) have unleashed a new wave of applications on mobile devices, such as various intelligent personal assistants. Most of these applications rely on the use of cloud resources to perform deep learning. With increasingly more powerful mobile devices, users can perform more deep learning tasks on the devices. In addition, learning on the devices has important advantages, such as personalization, privacy, and responsiveness; however, a good understanding of the capabilities of modern mobile devices in supporting deep learning is generally lacking. To address this gap in knowledge, this paper presents a comprehensive study on performing training and inference on mobile devices. It develops TensorFlow+, an extension of the widely used TensorFlow framework, to enable training DNNs on devices and use the available GPUs to accelerate the learning tasks. The study focuses on four aspects: 1) the performance impact of the network architecture; 2) the effectiveness of using accelerators for learning on mobile devices; 3) the resource and battery usages of training and inference; and 4) the performance impact on other applications running on the devices. The results show that the size (width and depth) of a network as well as the types of layers that it uses are important to not only meeting the device's capability but also to the performance of learning. The study also shows that hardware acceleration is important to both improving the speed of learning and reducing the impact on other applications on the device.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265--283.
[2]
Moustafa Alzantot, Yingnan Wang, Zhengshuang Ren, and Mani B Srivastava. 2017. Rstensorflow: GPU enabled tensorflow for deep learning on commodity android devices. In Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications. ACM, 7--12.
[3]
Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Proceedings of Advances in neural information processing systems. 2654--2662.
[4]
Stuart K Card, George G Robertson, and Jock D Mackinlay. 1991. The information visualizer, an information workspace. In Proceedings of the SIGCHI Conference on Human factors in computing systems. ACM, 181--186.
[5]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578--594.
[6]
Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In Proceedings of International Conference on Machine Learning. 2285--2294.
[7]
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of International conference on machine learning. 647--655.
[8]
Dumpsys. 2015. Dumpsys - A tool to provide information about system services on Android devices. https://developer.android.com/studio/command-line/dumpsys.
[9]
Eigen. 2017. Eigen Library. http://eigen.tuxfamily.org.
[10]
Facebook. 2017. Cafe2 - A new lightweight, modular, and scalable deep learning framework. https://caffe2.ai/.
[11]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of Advances in neural information processing systems. 2672--2680.
[12]
Google. 2013. RenderScript Overview. https://developer.android.com/guide/topics/renderscript/compute.
[13]
Google. 2017. Introduction to TensorFlow Lite. https://www.tensorflow.org/mobile/tflite.
[14]
Google. 2017. Pixel Visual Core (PVC) - Google. https://en.wikichip.org/wiki/google/pixel_visual_core. IPU.
[15]
Google. 2017. TensorFlow Mobile. https://www.tensorflow.org/mobile/android_build.
[16]
Google. 2019. On-Device Training with TensorFlow Lite. https://github.com/tensorflow/community/pull/124.
[17]
Hervé Guihot. 2012. RenderScript. In Pro Android Apps Performance Optimization. Springer, 231--263.
[18]
Song Han, HuiziMao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[19]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[20]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[21]
Gao Huang, Shichen Liu, Laurens Van der Maaten, and Kilian Q Weinberger. 2018. Condensenet: An efficient densenet using learned group convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2752--2761.
[22]
Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. 2018. Gist: Efficient data encoding for deep neural network training. In Proceedings of 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 776--789.
[23]
Deepak Kadetotad, Sairam Arunachalam, Chaitali Chakrabarti, and Jae-sun Seo. 2016. Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications. In Proceedings of Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on. IEEE, 1--8.
[24]
Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: collaborative intelligence between the cloud and mobile edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 615--629.
[25]
Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).
[26]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of Advances in neural information processing systems. 1097--1105.
[27]
ASU VISA Research Lab. 2019. TensorFlow+: A GPU accelerated deep learning framework for on-device training. https://github.com/ychen404/TensorFlowPlus.
[28]
Nicholas D Lane and Petko Georgiev. 2015. Can deep learning revolutionize mobile sensing?. In Proceedings of the 16th International Workshop on Mobile Computing Systems and Applications. ACM, 117--122.
[29]
Yann LeCun, John S Denker, and Sara A Solla. 1990. Optimal brain damage. In Proceedings of Advances in neural information processing systems. 598--605.
[30]
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV). 19--34.
[31]
Robert B Miller. 1968. Response time in man-computer conversational transactions. In Proceedings of AFIPS Fall Joint Computing Conference (1). 267--277.
[32]
Brad A Myers. 1985. The importance of percent-done progress indicators for computer-human interfaces. In ACM SIGCHI Bulletin, Vol. 16. ACM, 11--17.
[33]
PassMark. 2015. PassMark Software ? Performance Test System Benchmarks. http://www.passmark.com/baselines/index.php. PassMark.
[34]
Qualcomm. 2014. Trepn profiler. developer.qualcomm.com. Qualcomm Inc. Trepn profiler.
[35]
Qualcomm. 2017. Adreno. https://en.wikipedia.org/wiki/Adreno. Adreno.
[36]
S Rallapalli, H Qiu, A Bency, S Karthikeyan, R Govindan, B Manjunath, and R Urgaonkar. 2016. Are very deep neural networks feasible on mobile devices. IEEE Trans. Circ. Syst. Video Technol (2016).
[37]
Dennis M Ritchie, Brian W Kernighan, and Michael E Lesk. 1988. The C programming language. Prentice Hall Englewood Cliffs.
[38]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. FitNets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).
[39]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.
[40]
Ragini Sharma, Saman Biookaghazadeh, Baoxin Li, and Ming Zhao. 2018. Are existing knowledge transfer techniques effective for deep learning with edge devices?. In Proceedings of 2018 IEEE International Conference on Edge Computing (EDGE). IEEE, 42--49.
[41]
Laurent Sifre and PS Mallat. 2014. Rigid-motion scattering for image classification. Ph.D. Dissertation.
[42]
Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. 2017. Federated multi-task learning. In Proceedings of Advances in Neural Information Processing Systems. 4424--4434.
[43]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
[44]
Ragav Venkatesan and Baoxin Li. 2016. Diving deeper into mentee networks. arXiv preprint arXiv:1604.08220 (2016).
[45]
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).
[46]
Chaoyun Zhang, Paul Patras, and Hamed Haddadi. 2018. Deep Learning in Mobile and Wireless Networking: A Survey. arXiv preprint arXiv:1803.04311 (2018).

Cited By

View all
  • (2024)An Efficient Asynchronous Federated Learning Protocol for Edge DevicesIEEE Internet of Things Journal10.1109/JIOT.2024.340663411:17(28798-28808)Online publication date: 1-Sep-2024
  • (2024)Attention to Monkeypox: An Interpretable Monkeypox Detection Technique Using Attention MechanismIEEE Access10.1109/ACCESS.2024.338509912(51942-51965)Online publication date: 2024
  • (2023)EEFL: High-Speed Wireless Communications Inspired Energy Efficient Federated Learning over Mobile DevicesProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596865(544-556)Online publication date: 18-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing
November 2019
455 pages
ISBN:9781450367332
DOI:10.1145/3318216
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. edge computing
  3. mobile computing
  4. neural networks

Qualifiers

  • Research-article

Conference

SEC '19
Sponsor:
SEC '19: The Fourth ACM/IEEE Symposium on Edge Computing
November 7 - 9, 2019
Virginia, Arlington

Acceptance Rates

SEC '19 Paper Acceptance Rate 20 of 59 submissions, 34%;
Overall Acceptance Rate 40 of 100 submissions, 40%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)9
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Efficient Asynchronous Federated Learning Protocol for Edge DevicesIEEE Internet of Things Journal10.1109/JIOT.2024.340663411:17(28798-28808)Online publication date: 1-Sep-2024
  • (2024)Attention to Monkeypox: An Interpretable Monkeypox Detection Technique Using Attention MechanismIEEE Access10.1109/ACCESS.2024.338509912(51942-51965)Online publication date: 2024
  • (2023)EEFL: High-Speed Wireless Communications Inspired Energy Efficient Federated Learning over Mobile DevicesProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596865(544-556)Online publication date: 18-Jun-2023
  • (2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
  • (2023)A Reinforcement Learning Approach for Minimizing Job Completion Time in Clustered Federated LearningIEEE INFOCOM 2023 - IEEE Conference on Computer Communications10.1109/INFOCOM53939.2023.10228925(1-10)Online publication date: 17-May-2023
  • (2023)PAGroup: Privacy-aware grouping framework for high-performance federated learningJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.12.011175(37-50)Online publication date: May-2023
  • (2023)TongueMobile: automated tongue segmentation and diagnosis on smartphonesNeural Computing and Applications10.1007/s00521-023-08902-535:28(21259-21274)Online publication date: 4-Aug-2023
  • (2022)CAMDNN: Content-Aware Mapping of a Network of Deep Neural Networks on Edge MPSoCsIEEE Transactions on Computers10.1109/TC.2022.3207137(1-12)Online publication date: 2022
  • (2022)FedGPO: Heterogeneity-Aware Global Parameter optimization for Efficient Federated Learning2022 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC55918.2022.00020(117-129)Online publication date: Nov-2022
  • (2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 9-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media