ABSTRACT
Deep learning has the potential to make Augmented Reality (AR) devices smarter, but few AR apps use such technology today because it is compute-intensive, and front-end devices cannot deliver sufficient compute power. We propose a distributed framework that ties together front-end devices with more powerful back-end "helpers" that allow deep learning to be executed locally or to be offloaded. This framework should be able to intelligently use current estimates of network conditions and back-end server loads, in conjunction with the application's requirements, to determine an optimal strategy.
This work reports our preliminary investigation in implementing such a framework, in which the front-end is assumed to be smartphones. Our specific contributions include: (1) development of an Android application that performs real-time object detection, either locally on the smartphone or remotely on a server; and (2) characterization of the tradeoffs between object detection accuracy, latency, and battery drain, based on the system parameters of video resolution, deep learning model size, and offloading decision.
Supplemental Material
- The TensorFlow Authors. 2017. TensorFlow Android Camera Demo. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android. (2017).Google Scholar
- Tiffany Yu-Han Chen, Lenin Ravindranath, Shuo Deng, Paramvir Bahl, and Hari Balakrishnan. 2015. Glimpse: Continuous, real-time object recognition on mobile devices. ACM SenSys (2015).Google Scholar
- Junguk Cho, Karthikeyan Sundaresan, Rajesh Mahindra, Jacobus Van der Merwe, and Sampath Rangarajan. 2016. ACACIA: Context-aware Edge Computing for Continuous Interactive Applications over Mobile Networks. In ACM CoNEXT.Google Scholar
- Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. 2010. MAUI: making smartphones last longer with code offload. ACM MobiSys (2010).Google Scholar
- Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 2 (2010), 303--338. Google ScholarDigital Library
- Ross Girshick. 2015. Fast r-cnn. IEEE ICCV (2015).Google Scholar
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. IEEE CVPR (2014).Google Scholar
- GLIDE. 2017. The Camera Band for Apple Watch. http://getcmra.com/. (2017).Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. Book in preparation for MIT Press.Google Scholar
- Kiryong Ha, Zhuo Chen, Wenlu Hu, Wolfgang Richter, Padmanabhan Pillai, and Mahadev Satyanarayanan. 2014. Towards wearable cognitive assistance. ACM MobiSys (2014).Google Scholar
- Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints. In ACM Mobisys.Google ScholarDigital Library
- Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, and Kevin Murphy. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. IEEE CVPR (2017).Google Scholar
- Loc Nguyen Huynh, Rajesh Krishna Balan, and Youngki Lee. 2016. DeepSense: A GPU-based Deep Convolutional Neural Network Framework on Commodity Mobile Devices. In ACM WearSys.Google Scholar
- Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications. ACM MobiSys (2017).Google Scholar
- Michael Irving. 2016. Horus wearable helps the blind navigate, remember faces and read books. http://newatlas.com/horus-wearable-blind-assistant/46173/. (2016).Google Scholar
- Puneet Jain, Justin Manweiler, and Romit Roy Choudhury. 2016. Low Bandwidth Offload for Mobile AR. ACM CoNEXT (2016).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. NIPS (2012).Google Scholar
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. ECCV (2016). http://arxiv.org/abs/1512.02325Google Scholar
- David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2 (2004), 91--110. Google ScholarDigital Library
- John McCann. 2017. Google Pixel Review. http://www.techradar.com/reviews/google-pixel-review/4. (2017).Google Scholar
- John McCann. 2017. OnePlus 3T Review. http://www.techradar.com/reviews/oneplus-3t-review/3. (2017).Google Scholar
- Saman Naderiparizi, Pengyu Zhang, Matthai Philipose, Bodhi Priyantha, Jie Liu, and Deepak Ganesan. 2017. Glimpse: A Programmable Early-Discard Camera Architecture for Continuous Mobile Vision. ACM MobiSys (2017).Google ScholarDigital Library
- Thomas Olsson, Else Lagerstam, Tuula Kärkkäinen, and Kaisa Väänänen-Vainio-Mattila. 2013. Expected User Experience of Mobile Augmented Reality Services: A User Study in the Context of Shopping Centres. Personal Ubiquitous Comput. 17, 2 (Feb. 2013), 287--304. Google ScholarDigital Library
- Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wether-all, and Ramesh Govindan. 2011. Odessa: Enabling Interactive Perception Applications on Mobile Devices. In ACM MobiSys.Google ScholarDigital Library
- Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/. (2013-2016).Google Scholar
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. IEEE CVPR (2016).Google Scholar
- Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. CoRR abs/1612.08242 (2016). http://arxiv.org/abs/1612.08242Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS (2015).Google Scholar
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalch-brenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 529, 7587 (2016), 484--489. Google ScholarCross Ref
- Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In IEEE CVPR.Google Scholar
- Gabriel Takacs, Vijay Chandrasekhar, Natasha Gelfand, Yingen Xiong, Wei-Chao Chen, Thanos Bismpigiannis, Radek Grzeszczuk, Kari Pulli, and Bernd Girod. 2008. Outdoors Augmented Reality on Mobile Phone Using Loxel-based Visual Feature Organization. In ACM International Conference on Multimedia Information Retrieval. Google ScholarDigital Library
- Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. IEEE CVPR (2001).Google Scholar
Index Terms
- Delivering Deep Learning to Mobile Devices via Offloading
Recommendations
Location-aware multimedia proxy handoff over the IPv6 mobile network environment
In a server-proxy-client 3-tier networking architecture that is executed in the mobile network, proxies should be dynamically assigned to serve mobile hosts according to geographical dependency and the network situation. The goal of proxy handoff is to ...
A Location-aware Layer 7 Proxy Handoff Mechanism over the Mobile Network Environment
AINA '04: Proceedings of the 18th International Conference on Advanced Information Networking and Applications - Volume 2In a server-proxy-client 3-tier architecture of the wirelessmobile network, proxies should be dynamically assigned toserve mobile hosts according to geographical dependencyand the network situation. The goal of proxy handoffis to allow a mobile host ...
Design and evaluation of mobile offloading system for web-centric devices
Increasingly, smartphones are becoming one of the most popular mobile devices in the personal computing environment. As the need for a variety of mobile applications increases, the target mobile platform is a primary concern for mobile application ...
Comments