Abstract
Recording users’ lives as short-form videos has been an emerging trend with the advance of mobile devices. The videos contain a wealth of information that requires a significant amount of computation to retrieve. In this paper, we propose Deep action, a framework that leverages edge offloading to enable human actions recognition on mobile devices. Deep action first samples frames from a video according to the accuracy requirement. The sampled frames are then compressed and fed into deep learning models to generate an action label. Considering the varying conditions of the wireless connection, we design an online scheduler to strategically offload compressed video snippets to the edge server. Furthermore, we use OpenCL to implement the video compression-related operations on mobile GPU, such that the model inference and video compression can operate in parallel on the mobile device. We implement Deep action on the Android OS and evaluate it on a commercial off-the-shelf mobile device and an edge server. The performance evaluation demonstrates that Deep action brings up to 19 × and 13 × execution speedup, compared to the local-only and remote-only strategies, respectively.












Similar content being viewed by others
References
Investigation and Analysis Report on China’s Short Video Industry from 2018 to 2019.https://report.iimedia.cn/repo13-0/39194.html
Seeing AI. https://www.microsoft.com/en-us/ai/seeing-ai
TensorFlow Lite on GPU. https://tensorflow.google.cn/lite/performance/gpu_advanced
Afsar P, Cortez P, Santos H Automatic human action recognition from video using hidden markov model. IEEE 18th International Conference on Computational Science and Engineering, pp. 105–109
Afzal M, Shah N, Muhammad T (2019) Web video classification with visual and contextual semantics. Int J Commun Syst 32(13):1–15
Chang MJ, Hsieh JT, Fang CY, Chen SW (2019) A vision-based human action recog- nition system for moving cameras through deep learning. In: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning, pp. 85–91
Chen TYH, Ravindranath L, Deng S, Bahl P, Balakrishnan H (2015) Glimpse: Con- tinuous, real-time object recognition on mobile devices. In: Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys ’15, pp. 155–168. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2809695.2809711
Fangbemi AS, Liu B, Yu NH, Zhang Y (2018) Efficient human action recognition in- terface for augmented and virtual reality applications based on binary descriptor. International Conference on Augmented Reality, Virtual Reality and Computer Graph- ics (AVR), pp. 252–260. Springer
Hossain MS, Muhammad G, Abdul W, Song B, Gupta BB (2018) Cloud-assisted secure video transmission and sharing framework for smart cities. Futur Gener Comput Syst 83:596–606
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Huynh LN, Lee Y, Balan RK (2017) DeepMon: Mobile GPU-based deep learning frame- work for continuous vision applications. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pp. 82–95
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360
Ibrar M, Wang L, Muntean GM, Chen J, Shah N, Akbar A (2021) IHSF: An intelligent solution for improved performance of reliable and time-sensitive flows in hybrid SDN- based FC IoT systems. IEEE Internet Things J 8(5):3130–3142
Jararweh Y, Alsmirat M, Al-Ayyoub M, Benkhelifa E, Darabseh A, Gupta B, Doulat A (2017) Software-defined system support for enabling ubiquitous mobile edge com- puting. Comput J 60(10):1443–1457
Jegham I, Khalifa AB, Alouani I, Mahjoub MA (2020) Vision-based human action recognition: An overview and real world challenges. Forensic Science International: Digital Investigation 32:200901
Kaushik S, Gandhi C (2019) Ensure hierarchal identity based data security in cloud envi- ronment. International Journal of Cloud Applications and Computing (IJCAC) 9(4):21–36
Kumar A (2019) Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. International Journal of Cloud Applications and Computing (IJCAC) 9(3):22–36
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE
Li D, Deng L, Gupta BB, Wang H, Choi C (2019) A novel cnn based security guaran- teed image watermarking generation scenario for smart city applications. Inf Sci 479:432–447
Liu L, Zhong R, Zhang W, Liu Y, Zhang J, Zhang L, Gruteser M (2018) Cutting the cord: Designing a high-quality untethered vr system with low latency remote render- ing. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 68–80
Lu Z, Chan K, Pu S, Porta TL (2019) Crowdvision: A computing platform for video crowdprocessing using deep learning. IEEE Trans Mobile Comput 18(7):1513–1526
Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE
Mach P, Becvar Z (2017) Mobile edge computing: A survey on architecture and computation offloading. IEEE Commun Surveys Tuts 19(3):1628–1656. https://doi.org/10.1109/COMST.2017.2682318
Natarajan P, Nevatia R (2008) Online, real-time tracking and recognition of human actions. In: 2008 IEEE Workshop on Motion and video Computing, pp. 1–8. IEEE
Tejero-de Pablos A, Nakashima Y, Yokoya N, D´ıaz-Pernas FJ, Mart´ınez-Zarzuela M (2016) Flexible human action recognition in depth video sequences using masked joint trajectories. EURASIP Journal on Image and Video Processing 2016(1), pp. 1–12
Ran X, Chen H, Zhu X, Liu Z, Chen J (2018) DeepDecision: A mobile deep learning framework for edge video analytics. In: 2018 IEEE Conference on Computer Commu- nications (INFOCOM), pp. 1421–1429
Richardson IE (2004) H.264 and MPEG-4 video compression: video coding for next- generation multimedia. John Wiley & Sons pp. 159–220
Shechtman E, Irani M (2007) Space-time behavior-based correlation-or-how to tell if two underlying motion fields are similar without computing them? IEEE Trans Pattern Anal Mach Intell 29(11):2045–2056
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199
Su L, Lu Y, Wu F, Li S, Gao W (2009) Complexity-constrained H.264 video encoding. IEEE Trans Circuits Syst Video Technol 19(4):477–490
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 4489–4497
Tran D, Ray J, Shou Z, Chang SF, Paluri M (2017) Convnet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:1708.05038
Valery O, Liu P, Wu J (2017) CPU/GPU collaboration techniques for transfer learning on mobile devices. In: 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), pp. 477–484
Wang F, Zhang C, Liu J, Zhu Y, Pang H, Sun L (2019) Intelligent edge-assisted crowdcast with deep reinforcement learning for personalized qoe. IEEE Conference on Computer Communications, pp. 910–918. IEEE
Wu C, Zaheer M, Hu H, Manmatha R, Smola AJ, Kr¨ahenbu¨hl P (2018) Compressed video action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6026–6035
Xu M, Qian F, Zhu M, Huang F, Pushp S, Liu X (2019) Deepwear: Adaptive local offloading for on-wearable deep learning. IEEE Trans Mob Comput 19(2):314–330
Xu M, Zhu M, Liu Y, Lin FX, Liu X (2018) DeepCache: Principled cache for mobile deep vision. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 129–144
Zhang S, Wei Z, Nie J, Huang L, Zhen L (2017) A review on human activity recognition using vision-based method. Journal of Healthcare Engineering 2017(3):1–31
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856
Zhou Z, Chen X, Li E, Zeng L, Luo K, Zhang J (2019) Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc IEEE 107(8):1738–1762
Zhu Y, Li X, Liu C, Zolfaghari M, Xiong Y, Wu C, Zhang Z, Tighe J, Man- matha R, Li M (2020) A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, D., Zhang, H., Duan, S. et al. Deep action: A mobile action recognition framework using edge offloading. Peer-to-Peer Netw. Appl. 15, 324–339 (2022). https://doi.org/10.1007/s12083-021-01232-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-021-01232-0