Deep action: A mobile action recognition framework using edge offloading

Zhang, Deyu; Zhang, Heguo; Duan, Sijing; Luo, Yunzhen; Jia, Fucheng; Liu, Feng

doi:10.1007/s12083-021-01232-0

Deep action: A mobile action recognition framework using edge offloading

Published: 06 October 2021

Volume 15, pages 324–339, (2022)
Cite this article

Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Deyu Zhang¹,
Heguo Zhang²,
Sijing Duan¹,
Yunzhen Luo¹,
Fucheng Jia¹ &
…
Feng Liu ORCID: orcid.org/0000-0002-0357-6596³

333 Accesses
Explore all metrics

Abstract

Recording users’ lives as short-form videos has been an emerging trend with the advance of mobile devices. The videos contain a wealth of information that requires a significant amount of computation to retrieve. In this paper, we propose Deep action, a framework that leverages edge offloading to enable human actions recognition on mobile devices. Deep action first samples frames from a video according to the accuracy requirement. The sampled frames are then compressed and fed into deep learning models to generate an action label. Considering the varying conditions of the wireless connection, we design an online scheduler to strategically offload compressed video snippets to the edge server. Furthermore, we use OpenCL to implement the video compression-related operations on mobile GPU, such that the model inference and video compression can operate in parallel on the mobile device. We implement Deep action on the Android OS and evaluate it on a commercial off-the-shelf mobile device and an edge server. The performance evaluation demonstrates that Deep action brings up to 19 × and 13 × execution speedup, compared to the local-only and remote-only strategies, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

Efficient NPU–GPU scheduling for real-time deep learning inference on mobile devices

Article 11 April 2025

TLS-RWKV: Real-Time Online Action Detection with Temporal Label Smoothing

Article Open access 19 February 2024

References

Investigation and Analysis Report on China’s Short Video Industry from 2018 to 2019.https://report.iimedia.cn/repo13-0/39194.html
Seeing AI. https://www.microsoft.com/en-us/ai/seeing-ai
TensorFlow Lite on GPU. https://tensorflow.google.cn/lite/performance/gpu_advanced
Afsar P, Cortez P, Santos H Automatic human action recognition from video using hidden markov model. IEEE 18th International Conference on Computational Science and Engineering, pp. 105–109
Afzal M, Shah N, Muhammad T (2019) Web video classification with visual and contextual semantics. Int J Commun Syst 32(13):1–15
Article Google Scholar
Chang MJ, Hsieh JT, Fang CY, Chen SW (2019) A vision-based human action recog- nition system for moving cameras through deep learning. In: Proceedings of the 2019 2nd International Conference on Signal Processing and Machine Learning, pp. 85–91
Chen TYH, Ravindranath L, Deng S, Bahl P, Balakrishnan H (2015) Glimpse: Con- tinuous, real-time object recognition on mobile devices. In: Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys ’15, pp. 155–168. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2809695.2809711
Fangbemi AS, Liu B, Yu NH, Zhang Y (2018) Efficient human action recognition in- terface for augmented and virtual reality applications based on binary descriptor. International Conference on Augmented Reality, Virtual Reality and Computer Graph- ics (AVR), pp. 252–260. Springer
Hossain MS, Muhammad G, Abdul W, Song B, Gupta BB (2018) Cloud-assisted secure video transmission and sharing framework for smart cities. Futur Gener Comput Syst 83:596–606
Article Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Huynh LN, Lee Y, Balan RK (2017) DeepMon: Mobile GPU-based deep learning frame- work for continuous vision applications. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pp. 82–95
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360
Ibrar M, Wang L, Muntean GM, Chen J, Shah N, Akbar A (2021) IHSF: An intelligent solution for improved performance of reliable and time-sensitive flows in hybrid SDN- based FC IoT systems. IEEE Internet Things J 8(5):3130–3142
Article Google Scholar
Jararweh Y, Alsmirat M, Al-Ayyoub M, Benkhelifa E, Darabseh A, Gupta B, Doulat A (2017) Software-defined system support for enabling ubiquitous mobile edge com- puting. Comput J 60(10):1443–1457
Article Google Scholar
Jegham I, Khalifa AB, Alouani I, Mahjoub MA (2020) Vision-based human action recognition: An overview and real world challenges. Forensic Science International: Digital Investigation 32:200901
Kaushik S, Gandhi C (2019) Ensure hierarchal identity based data security in cloud envi- ronment. International Journal of Cloud Applications and Computing (IJCAC) 9(4):21–36
Article Google Scholar
Kumar A (2019) Design of secure image fusion technique using cloud for privacy-preserving and copyright protection. International Journal of Cloud Applications and Computing (IJCAC) 9(3):22–36
Article Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE
Li D, Deng L, Gupta BB, Wang H, Choi C (2019) A novel cnn based security guaran- teed image watermarking generation scenario for smart city applications. Inf Sci 479:432–447
Article Google Scholar
Liu L, Zhong R, Zhang W, Liu Y, Zhang J, Zhang L, Gruteser M (2018) Cutting the cord: Designing a high-quality untethered vr system with low latency remote render- ing. In: Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, pp. 68–80
Lu Z, Chan K, Pu S, Porta TL (2019) Crowdvision: A computing platform for video crowdprocessing using deep learning. IEEE Trans Mobile Comput 18(7):1513–1526
Article Google Scholar
Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE
Mach P, Becvar Z (2017) Mobile edge computing: A survey on architecture and computation offloading. IEEE Commun Surveys Tuts 19(3):1628–1656. https://doi.org/10.1109/COMST.2017.2682318
Article Google Scholar
Natarajan P, Nevatia R (2008) Online, real-time tracking and recognition of human actions. In: 2008 IEEE Workshop on Motion and video Computing, pp. 1–8. IEEE
Tejero-de Pablos A, Nakashima Y, Yokoya N, D´ıaz-Pernas FJ, Mart´ınez-Zarzuela M (2016) Flexible human action recognition in depth video sequences using masked joint trajectories. EURASIP Journal on Image and Video Processing 2016(1), pp. 1–12
Ran X, Chen H, Zhu X, Liu Z, Chen J (2018) DeepDecision: A mobile deep learning framework for edge video analytics. In: 2018 IEEE Conference on Computer Commu- nications (INFOCOM), pp. 1421–1429
Richardson IE (2004) H.264 and MPEG-4 video compression: video coding for next- generation multimedia. John Wiley & Sons pp. 159–220
Shechtman E, Irani M (2007) Space-time behavior-based correlation-or-how to tell if two underlying motion fields are similar without computing them? IEEE Trans Pattern Anal Mach Intell 29(11):2045–2056
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199
Su L, Lu Y, Wu F, Li S, Gao W (2009) Complexity-constrained H.264 video encoding. IEEE Trans Circuits Syst Video Technol 19(4):477–490
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 4489–4497
Tran D, Ray J, Shou Z, Chang SF, Paluri M (2017) Convnet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:1708.05038
Valery O, Liu P, Wu J (2017) CPU/GPU collaboration techniques for transfer learning on mobile devices. In: 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), pp. 477–484
Wang F, Zhang C, Liu J, Zhu Y, Pang H, Sun L (2019) Intelligent edge-assisted crowdcast with deep reinforcement learning for personalized qoe. IEEE Conference on Computer Communications, pp. 910–918. IEEE
Wu C, Zaheer M, Hu H, Manmatha R, Smola AJ, Kr¨ahenbu¨hl P (2018) Compressed video action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6026–6035
Xu M, Qian F, Zhu M, Huang F, Pushp S, Liu X (2019) Deepwear: Adaptive local offloading for on-wearable deep learning. IEEE Trans Mob Comput 19(2):314–330
Article Google Scholar
Xu M, Zhu M, Liu Y, Lin FX, Liu X (2018) DeepCache: Principled cache for mobile deep vision. In: Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 129–144
Zhang S, Wei Z, Nie J, Huang L, Zhen L (2017) A review on human activity recognition using vision-based method. Journal of Healthcare Engineering 2017(3):1–31
Google Scholar
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6848–6856
Zhou Z, Chen X, Li E, Zeng L, Luo K, Zhang J (2019) Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc IEEE 107(8):1738–1762
Article Google Scholar
Zhu Y, Li X, Liu C, Zolfaghari M, Xiong Y, Wu C, Zhang Z, Tighe J, Man- matha R, Li M (2020) A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567

Download references

Author information

Authors and Affiliations

Central South University, 932 Lushan S Rd, Changsha, Hunan, China
Deyu Zhang, Sijing Duan, Yunzhen Luo & Fucheng Jia
Hunan Provincial Education Examination Board Xiaoxiang Middle Rd, Changsha, Hunan, China
Heguo Zhang
Shandong Youth University of Political Science, 31699 Jingshi E Rd, Jinan, 250102, Shandong, China
Feng Liu

Authors

Deyu Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Heguo Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Sijing Duan
View author publications
You can also search for this author inPubMed Google Scholar
Yunzhen Luo
View author publications
You can also search for this author inPubMed Google Scholar
Fucheng Jia
View author publications
You can also search for this author inPubMed Google Scholar
Feng Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Feng Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, D., Zhang, H., Duan, S. et al. Deep action: A mobile action recognition framework using edge offloading. Peer-to-Peer Netw. Appl. 15, 324–339 (2022). https://doi.org/10.1007/s12083-021-01232-0

Download citation

Received: 17 December 2020
Accepted: 02 August 2021
Published: 06 October 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s12083-021-01232-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep action: A mobile action recognition framework using edge offloading

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

Efficient NPU–GPU scheduling for real-time deep learning inference on mobile devices

TLS-RWKV: Real-Time Online Action Detection with Temporal Label Smoothing

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now