ABSTRACT
Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ecosystem of modern DL libs and provides quantitative results on their performance. In this paper, we first build a comprehensive benchmark that includes 6 representative DL libs and 15 diversified DL models. We then perform extensive experiments on 10 mobile devices, which help reveal a complete landscape of the current mobile DL libs ecosystem. For example, we find that the best-performing DL lib is severely fragmented across different models and hardware, and the gap between those DL libs can be rather huge. In fact, the impacts of DL libs can overwhelm the optimizations from algorithms or hardware, e.g., model quantization and GPU/DSP-based heterogeneous computing. Finally, atop the observations, we summarize practical implications to different roles in the DL lib ecosystem.
- [n.d.]. Performance measurement | TensorFlow Lite. https://www.tensorflow.org/lite/performance/measurement. https://www.tensorflow.org/lite/performance/measurement?hl=zh-cnGoogle Scholar
- 2017. Snapdragon SNPE deep learning framework.https://developer.qualcomm.com/sites/default/files/docs/snpe/overview.html.Google Scholar
- 2017. XiaoMi Mace deep learning framework.https://github.com/XiaoMi/mace.Google Scholar
- 2018. Tencent ncnn deep learning framework.https://github.com/Tencent/ncnn.Google Scholar
- 2019. Alibaba MNN deep learning framework.https://github.com/alibaba/MNN.Google Scholar
- 2019. Pytorch Mobile.https://pytorch.org/mobile/home/.Google Scholar
- 2020. AI Matrix: To make it easy to benchmark AI accelerators. https://github.com/alibaba/ai-matrix.Google Scholar
- 2020. DEEP LEARNING MARKET - GROWTH, TRENDS, FORECASTS (2020-2025). https://www.mordorintelligence.com/industry-reports/deep-learning.Google Scholar
- 2020. Pytorch model zoo. https://pytorch.org/serve/model_zoo.html.Google Scholar
- 2020. Qualcomm Snapdragon 855 vs Qualcomm Snapdragon 870. https://versus.com/en/qualcomm-snapdragon-855-vs-qualcomm-snapdragon-870.Google Scholar
- 2020. TensorFlow model zoo. https://github.com/tensorflow/models.Google Scholar
- 2020. WHITE PAPER : On Artificial Intelligence - A European approach to excellence and trust. https://ec.europa.eu/info/sites/default/files/commission-white-paper-artificial-intelligence-feb2020_en.pdf.Google Scholar
- 2021. AIIA DNN Benchmark. https://github.com/AIIABenchmark/AIIA-DNN-benchmark.Google Scholar
- 2021. Artificial Intelligence Market Analysis Report. https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-ai-market.Google Scholar
- 2021. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. https://dawn.cs.stanford.edu/benchmark/CIFAR10/inference.html.Google Scholar
- 2021. DeepBench: Benchmarking Deep Learning operations on different hardware. https://github.com/baidu-research/DeepBench.Google Scholar
- 2021. Qualcomm Hexagon.https://en.wikipedia.org/wiki/Qualcomm_Hexagon.Google Scholar
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: A system for large-scale machine learning. In OSDI). 265–283.Google Scholar
- Robert Adolf, Saketh Rama, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2016. Fathom: Reference workloads for modern deep learning methods. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1–10.Google ScholarCross Ref
- Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, and Hadi Esmaeilzadeh. 2020. Chameleon: Adaptive code optimization for expedited deep neural network compilation. arXiv preprint arXiv:2001.08743(2020).Google Scholar
- Mario Almeida, Stefanos Laskaridis, Ilias Leontiadis, Stylianos I Venieris, and Nicholas D Lane. 2019. EmBench: Quantifying performance variations of deep neural networks across modern commodity devices. In The 3rd international workshop on deep learning for mobile systems and applications. 1–6.Google ScholarDigital Library
- Mario Almeida, Stefanos Laskaridis, Abhinav Mehrotra, Lukasz Dudziak, Ilias Leontiadis, and Nicholas D Lane. 2021. Smart at what cost? Characterising Mobile Deep Neural Networks in the wild. arXiv preprint arXiv:2109.13963(2021).Google Scholar
- Dongqi Cai, Qipeng Wang, Yuanqiang Liu, Yunxin Liu, Shangguang Wang, and Mengwei Xu. 2021. Towards Ubiquitous Learning: A First Measurement of On-Device Training Performance. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning. 31–36.Google ScholarDigital Library
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Q Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: end-to-end optimization stack for deep learning. arXiv preprint arXiv:1802.04799 11 (2018), 20.Google Scholar
- Sofiane Chetoui and Sherief Reda. 2021. Workload-and User-aware Battery Lifetime Management for Mobile SoCs. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1679–1684.Google Scholar
- Valentin Dalibard, Michael Schaarschmidt, and Eiko Yoneki. 2017. BOAT: Building auto-tuners with structured Bayesian optimization. In Proceedings of the 26th International Conference on World Wide Web. 479–488.Google ScholarDigital Library
- Yunhui Guo, Yandong Li, Liqiang Wang, and Tajana Rosing. 2019. Depthwise convolution is all you need for learning multiple visual domains. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8368–8375.Google ScholarDigital Library
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International conference on machine learning. PMLR, 1737–1746.Google Scholar
- Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861(2017).Google Scholar
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700–4708.Google ScholarCross Ref
- Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer. 2014. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869(2014).Google Scholar
- Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360(2016).Google Scholar
- Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. Ai benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 0–0.Google Scholar
- Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. Ai benchmark: All about deep learning on smartphones in 2019. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 3617–3635.Google ScholarCross Ref
- Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713.Google ScholarCross Ref
- Gangwon Jo, Won Jong Jeon, Wookeun Jung, Gordon Taft, and Jaejin Lee. 2014. OpenCL framework for ARM processors with NEON support. In Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing. 33–40.Google ScholarDigital Library
- Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.Google ScholarDigital Library
- Dohyun Kim. 2013. A Study of User Data Integrity During Acquisition of Android Devices. (2013).Google Scholar
- Youngsok Kim, Joonsung Kim, Dongju Chae, Daehyun Kim, and Jangwoo Kim. 2019. μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–15.Google ScholarDigital Library
- Stefanos Laskaridis, Alexandros Kouris, and Nicholas D Lane. 2021. Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions. arXiv preprint arXiv:2106.05022(2021).Google Scholar
- Stefanos Laskaridis, Stylianos I Venieris, Mario Almeida, Ilias Leontiadis, and Nicholas D Lane. 2020. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–15.Google ScholarDigital Library
- Charles E Leiserson, Neil C Thompson, Joel S Emer, Bradley C Kuszmaul, Butler W Lampson, Daniel Sanchez, and Tao B Schardl. 2020. There’s plenty of room at the Top: What will drive computer performance after Moore’s law?Science 368, 6495 (2020).Google Scholar
- Ilias Leontiadis, Stefanos Laskaridis, Stylianos I Venieris, and Nicholas D Lane. 2021. It’s always personal: Using Early Exits for Efficient On-Device CNN Personalisation. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 15–21.Google ScholarDigital Library
- Yuhang Li, Wei Wang, Haoli Bai, Ruihao Gong, Xin Dong, and Fengwei Yu. 2020. Efficient bitwidth search for practical mixed precision neural network. arXiv preprint arXiv:2003.07577(2020).Google Scholar
- Sicong Liu, Yingyan Lin, Zimu Zhou, Kaiming Nan, Hui Liu, and Junzhao Du. 2018. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. 389–400.Google ScholarDigital Library
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.Google ScholarCross Ref
- Chunjie Luo, Xiwen He, Jianfeng Zhan, Lei Wang, Wanling Gao, and Jiahui Dai. 2020. Comparison and benchmarking of ai models and frameworks on mobile devices. arXiv preprint arXiv:2005.05085(2020).Google Scholar
- Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, 2019. Mlperf training benchmark. arXiv preprint arXiv:1910.01500(2019).Google Scholar
- Felix Mues. 2020. Optimization of OpenGL streaming in distributed embedded systems. (2020).Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703(2019).Google Scholar
- Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, and Ronan Collobert. 2020. Scaling up online speech recognition using convnets. arXiv preprint arXiv:2001.09727(2020).Google Scholar
- Xiuquan Qiao, Pei Ren, Schahram Dustdar, and Junliang Chen. 2018. A new era for web AR with mobile edge computing. IEEE Internet Computing 22, 4 (2018), 46–55.Google ScholarCross Ref
- Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767(2018).Google Scholar
- Pei Ren, Xiuquan Qiao, Yakun Huang, Ling Liu, Calton Pu, and Schahram Dustdar. 2021. Fine-grained Elastic Partitioning for Distributed DNN towards Mobile Web AR Services in the 5G Era. IEEE Transactions on Services Computing(2021).Google ScholarCross Ref
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.Google ScholarCross Ref
- Graham Sellers and John Kessenich. 2016. Vulkan programming guide: The official guide to learning vulkan. Addison-Wesley Professional.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google Scholar
- Saurabh Singh and Shankar Krishnan. 2020. Filter response normalization layer: Eliminating batch dependence in the training of deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11237–11246.Google ScholarCross Ref
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence.Google ScholarDigital Library
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826.Google ScholarCross Ref
- Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828.Google ScholarCross Ref
- Xiaohu Tang, Shihao Han, Li Lyna Zhang, Ting Cao, and Yunxin Liu. 2021. To Bridge Neural Network Design and Real-World Performance: A Behaviour Study for Neural Networks. Proceedings of Machine Learning and Systems 3 (2021).Google Scholar
- Robert Tornai and Péter Fürjes-Benke. 2021. Compute Shader in Image Processing Development. (2021).Google Scholar
- Vincent Vanhoucke, Andrew Senior, and Mark Z Mao. 2011. Improving the speed of neural networks on CPUs. (2011).Google Scholar
- Haoyu Wang, Hao Li, and Yao Guo. 2019. Understanding the evolution of mobile app ecosystems: A longitudinal measurement study of google play. In The World Wide Web Conference. 1988–1999.Google ScholarDigital Library
- Manni Wang, Shaohua Ding, Ting Cao, Yunxin Liu, and Fengyuan Xu. 2021. AsyMo: scalable and efficient deep-learning inference on asymmetric mobile CPUs. In MobiCom. 215–228.Google Scholar
- Lili Wei, Yepang Liu, Shing-Chi Cheung, Huaxun Huang, Xuan Lu, and Xuanzhe Liu. 2020. Understanding and Detecting Fragmentation-Induced Compatibility Issues for Android Apps. IEEE Trans. Software Eng. 46, 11 (2020), 1176–1199. https://doi.org/10.1109/TSE.2018.2876439Google ScholarCross Ref
- Hao Wu, Jinghao Feng, Xuejin Tian, Edward Sun, Yunxin Liu, Bo Dong, Fengyuan Xu, and Sheng Zhong. 2020. EMO: Real-time emotion recognition from single-eye images for resource-constrained eyewear devices. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services. 448–461.Google ScholarDigital Library
- Ruofan Wu, Feng Zhang, Zhen Zheng, Xiaoyong Du, and Xipeng Shen. 2021. Exploring deep reuse in winograd CNN inference. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 483–484.Google ScholarDigital Library
- Liang Xu, Xuanwei Zhang, and Qianqian Dong. 2020. CLUECorpus2020: A large-scale Chinese corpus for pre-training language model. arXiv preprint arXiv:2003.01355(2020).Google Scholar
- Mengwei Xu, Zhe Fu, Xiao Ma, Li Zhang, Yanan Li, Feng Qian, Shangguang Wang, Ke Li, Jingyu Yang, and Xuanzhe Liu. 2021. From cloud to edge: a first look at public edge platforms. In IMC ’21: ACM Internet Measurement Conference, Virtual Event, USA, November 2-4, 2021, Dave Levin, Alan Mislove, Johanna Amann, and Matthew Luckie (Eds.). ACM, 37–53.Google ScholarDigital Library
- Mengwei Xu, Jiawei Liu, Yuanqiang Liu, Felix Xiaozhu Lin, Yunxin Liu, and Xuanzhe Liu. 2019. A first look at deep learning apps on smartphones. In The World Wide Web Conference. 2125–2136.Google ScholarDigital Library
- Mengwei Xu, Feng Qian, Qiaozhu Mei, Kang Huang, and Xuanzhe Liu. 2018. Deeptype: On-device deep learning for input personalization service with minimal privacy concern. IMWUT 2, 4 (2018), 1–26.Google Scholar
- Mengwei Xu, Feng Qian, Mengze Zhu, Feifan Huang, Saumay Pushp, and Xuanzhe Liu. 2020. DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning. IEEE Trans. Mob. Comput. 19, 2 (2020), 314–330.Google ScholarCross Ref
- Mengwei Xu, Tiantu Xu, Yunxin Liu, and Felix Xiaozhu Lin. 2021. Video Analytics with Zero-streaming Cameras. In 2021 USENIX Annual Technical Conference, USENIX ATC 2021, July 14-16, 2021, Irina Calciu and Geoff Kuenning (Eds.). USENIX Association, 459–472.Google Scholar
- Mengwei Xu, Xiwen Zhang, Yunxin Liu, Gang Huang, Xuanzhe Liu, and Felix Xiaozhu Lin. 2020. Approximate query service on autonomous IoT cameras. In MobiSys ’20: The 18th Annual International Conference on Mobile Systems, Applications, and Services, Toronto, Ontario, Canada, June 15-19, 2020, Eyal de Lara, Iqbal Mohomed, Jason Nieh, and Elizabeth M. Belding (Eds.). ACM, 191–205.Google ScholarDigital Library
- Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. Deepcache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 129–144.Google ScholarDigital Library
- Jingyun Yang, Hengjun Wang, and Kexiang Guo. 2020. A Subsequent Words Recommendation Scheme for Chinese Input Method based on Deep Reinforcement Learning. In 2020 IEEE 6th International Conference on Computer and Communications (ICCC). IEEE, 1482–1487.Google ScholarCross Ref
- Hyunho Yeo, Chan Ju Chong, Youngmok Jung, Juncheol Ye, and Dongsu Han. 2020. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–14.Google ScholarDigital Library
- Yunhua Yin, Huifang Li, and Wei Fu. 2020. Faster-YOLO: An accurate and faster object detection method. Digital Signal Processing 102 (2020), 102756.Google ScholarCross Ref
- Salih Can Yurtkulu, Yusuf Hüseyin Şahin, and Gozde Unal. 2019. Semantic Segmentation with Extended DeepLabv3 Architecture. In 2019 27th Signal Processing and Communications Applications Conference (SIU). IEEE, 1–4.Google ScholarCross Ref
- Jinrui Zhang, Deyu Zhang, Xiaohui Xu, Fucheng Jia, Yunxin Liu, Xuanzhe Liu, Ju Ren, and Yaoxue Zhang. 2020. MobiPose: Real-time multi-person pose estimation on mobile devices. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 136–149.Google ScholarDigital Library
- Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, 2020. Ansor: Generating high-performance tensor programs for deep learning. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 863–879.Google Scholar
- Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710.Google ScholarCross Ref
Index Terms
- A Comprehensive Benchmark of Deep Learning Libraries on Mobile Devices
Recommendations
Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions
Recent years have witnessed an exponential increase in the use of mobile and embedded devices. With the great success of deep learning in many fields, there is an emerging trend to deploy deep learning on mobile and embedded devices to better meet the ...
M-Library: a mobile service in online WebPAC
Libraries always rely on Information Technology (IT) to systematically manage their huge books and journals (e.g. Digital Library (DL) and Web-Based Online Public Access Catalogue (WebPAC)). In order to enhance the library's mobile-related advance ...
Distance-Learning and Converging Mobile Devices
ITNG '09: Proceedings of the 2009 Sixth International Conference on Information Technology: New GenerationsThis paper reports on the use, effectiveness, and acceptance of graduate computer science course lectures recorded and formatted for mobile devices, including Video iPods, PDAs, and Ultra-Mobile PCs (UMPC). Technology convergence is trending toward that ...
Comments