research-article

NEMO: enabling neural-enhanced video streaming on commodity mobile devices

Authors:

Dongsu HanAuthors Info & Claims

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

Article No.: 28, Pages 1 - 14

https://doi.org/10.1145/3372224.3419185

Published: 18 September 2020 Publication History

Abstract

The demand for mobile video streaming has experienced tremendous growth over the last decade. However, existing methods of video delivery fall short of delivering high-quality video. Recent advances in neural super-resolution have opened up the possibility of enhancing video quality by leveraging client-side computation. Unfortunately, mobile devices cannot benefit from this because it is too expensive in computation and power-hungry.

To overcome the limitation, we present NEMO, a system that enables real-time video super-resolution on mobile devices. NEMO applies neural super-resolution to a few select frames and transfers the outputs to benefit the remaining frames. The frames to which super-resolution is applied are carefully chosen to maximize the overall quality gains. NEMO leverages fine-grained dependencies using information from the video codec and strives to provide guarantees in the quality degradation compared to per-frame super-resolution. Our evaluation using a full system implementation on Android shows NEMO improves the overall processing throughput by x11.5, reduces energy consumption by 88.6%, and maintains device temperatures at acceptable levels compared to per-frame super-resolution, while ensuring high video quality. Overall, this leads to a 31.2% improvement in quality of experience for mobile users.

References

[1]

3G Specification. https://www.etsi.org/technologies/mobile/3g/

[2]

4G Specification. https://www.etsi.org/technologies/mobile/4g/

[3]

5G Specification. https://www.etsi.org/technologies/mobile/5g/

[4]

Akamai Mobile CDN. https://www.akamai.com/us/en/resources/mobile-cdn.jsp.

[5]

Amazon Device Farm Official Website. https://aws.amazon.com/device-farm/?nc1=h_ls.

[6]

Apple HLS Specification. https://developer.apple.com/streaming/.

[7]

ARM Neon Overview. https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.

[8]

AV1 Specification. https://aomedia.org/av1-features/get-started/.

[9]

Big Buck Bunny. https://peach.blender.org/.

[10]

Brandwatch Article about "54 Fascinating and Incredible YouTube Statistics". https://www.brandwatch.com/blog/youtube-stats/.

[11]

Brightcove Report about "Q3 2019 Brightcove Global Video Index". https://www.brightcove.com/en/video-index/.

[12]

Draft VP9 Bitstream and Decoding Process Specification. https://www.webmproject.org/vp9/

[13]

FLIR Official Website. https://www.flir.com/.

[14]

Google's Exoplayer Official Website. https://developer.android.com/guide/topics/media/exoplayer.

[15]

Google's libvpx Official Github Repository. https://github.com/webmproject/libvpx/.

[16]

H.264 : Advanced video coding for generic audiovisual services. https://www.itu.int/rec/T-REC-H.264-200305-S/en

[17]

H.264 Specification. https://www.itu.int/rec/T-REC-H.264.

[18]

H.265 Specification. https://www.itu.int/rec/T-REC-H.265.

[19]

LG GPad5 Specifications. https://www.gsmarena.com/lg_g_pad_5_10_1-9952.php.

[20]

Medium report about "Top 10 Most Popular Types of Videos on YouTube". https://mag.octoly.com/here-are-the-top-10-most-popular-types-of-videos-on-youtube-4ea1e1a192ac.

[21]

Monsoon Official Website. https://www.msoon.com/.

[22]

MPEG-DASH Specification. https://dashif.org/.

[23]

NEMO's official Github Repository. https://github.com/kaist-ina/nemo.

[24]

Qualcomm Article about the Performance of Tensorflow on Mobiles. https://www.qualcomm.com/news/onq/2017/01/09/tensorflow-machine-learning-now-optimized-snapdragon-835-and-hexagon-682-dsp.

[25]

Qualcomm Snapdragon Neural Processing Engine Official Website. https://developer.qualcomm.com/docs/snpe/index.html.

[26]

Statista Report about "Mobile Share of Global Digital Video Plays from 3rd Quarter 2013 to 2nd Quarter 2018". https://www.statista.com/statistics/444318/mobile-device-video-views-share/.

[27]

VVC Specification. https://mpeg.chiariglione.org/standards/mpeg-i/versatile-video-coding.

[28]

Webm Official Website. https://www.webmproject.org/.

[29]

Wowza CDN for Mobile Video Streaming. https://www.wowza.com/docs/using-wowza-cdn-with-wowza-streaming-engine.

[30]

Wowza's DASH bitrate recommendation. https://www.wowza.com/docs/how-to-encode-source-video-for-wowza-streaming-cloud.

[31]

Xiaomi Mi9 Specifications. https://www.gsmarena.com/xiaomi_mi_9-9507.php.

[32]

Xiaomi Redmi Note7 Specifications. https://www.gsmarena.com/xiaomi_redmi_note_7-9513.php.

[33]

YouTube dataset (Education 1). https://www.youtube.com/watch?v=0eaf6bUMd4U.

[34]

YouTube dataset (Unboxing). https://www.youtube.com/watch?v=l0DoQYGZt8M.

[35]

Ghufran Baig, Jian He, Mubashir Adnan Qureshi, Lili Qiu, Guohai Chen, Peng Chen, and Yinliang Hu. 2019. Jigsaw: Robust live 4k video streaming. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.

Digital Library

[36]

Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, and Hui Zhang. 2013. Developing a predictive model of quality of experience for internet video. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 339--350.

Digital Library

[37]

Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4778--4787.

[38]

Lukas Cavigelli, Philippe Degen, and Luca Benini. 2017. Cbinfer: Change-based inference for convolutional neural networks on video data. In Proceedings of the 11th International Conference on Distributed Smart Cameras. 1--8.

Digital Library

[39]

Zhibo Chen, Jianfeng Xu, Yun He, and Junli Zheng. 2006. Fast integer-pel and fractional-pel motion estimation for H. 264/AVC. Journal of visual communication and image representation 17, 2 (2006), 264--290.

[40]

M. Dasari, A. Bhattacharya, S. Vargas, P. Sahu, A. Balasubramanian, and S. R. Das. 2020. Streaming 360° Videos using Super-resolution. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM).

[41]

Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115--127.

Digital Library

[42]

Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S Wahby, and Keith Winstein. 2018. Salsify: Low-latency network video through tighter integration between a video codec and a transport protocol. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 267--282.

[43]

Aditya Ganjam, Faisal Siddiqui, Jibin Zhan, Xi Liu, Ion Stoica, Junchen Jiang, Vyas Sekar, and Hui Zhang. 2015. C3: Internet-Scale Control Plane for Video Quality Optimization. In Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI), Vol. 15. 131--144.

[44]

Pan Hu, Rakesh Misra, and Sachin Katti. 2019. Dejavu: Enhancing Videoconferencing with Prior Knowledge. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications. ACM, 63--68.

Digital Library

[45]

Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82--95.

Digital Library

[46]

Junchen Jiang, Vyas Sekar, Henry Milner, Davis Shepherd, Ion Stoica, and Hui Zhang. 2016. {CFA}: A practical prediction system for video qoe optimization. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 137--150.

[47]

Junchen Jiang, Shijie Sun, Vyas Sekar, and Hui Zhang. 2017. Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation. In Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI), Vol. 1. 3.

[48]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.

[49]

Soowon Kang, Hyeonwoo Choi, Sooyoung Park, Chunjong Park, Jemin Lee, Uichin Lee, and Sung-Ju Lee. 2019. Fire in Your Hands: Understanding Thermal Behavior of Smartphones. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.

Digital Library

[50]

Jaehong Kim, Youngmok Jung, Hyunho Yeo, Juncheol Ye, and Dongsu Han. 2020. Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).

Digital Library

[51]

Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1646--1654.

[52]

JC Lawrence and JP Bull. 1976. Thermal conditions which cause skin burns. Engineering in Medicine 5, 3 (1976), 61--63.

[53]

Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681--4690.

[54]

Royson Lee, Stylianos I Venieris, Lukasz Dudziak, Sourav Bhattacharya, and Nicholas D Lane. 2019. MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.

Digital Library

[55]

Hongqiang Harry Liu, Ye Wang, Yang Richard Yang, Hao Wang, and Chen Tian. 2012. Optimizing cost and performance for content multihoming. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).

Digital Library

[56]

Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural Adaptive Video Streaming with Pensieve. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM) (Los Angeles, CA, USA). 197--210.

Digital Library

[57]

Matthew K. Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srinivasan Seshan, and Hui Zhang. 2015. Practical, Real-time Centralized Control for CDN-based Live Video Delivery. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM) (London, United Kingdom). 311--324.

Digital Library

[58]

Kevin Spiteri, Rahul Urgaonkar, and Ramesh K Sitaraman. 2016. BOLA: Nearoptimal bitrate adaptation for online videos. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM). IEEE, 1--9.

[59]

Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. DeepCache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 129--144.

Digital Library

[60]

Hyunho Yeo, Sunghyun Do, and Dongsu Han. 2017. How will Deep Learning Change Internet Video Delivery?. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks. ACM, 57--64.

Digital Library

[61]

Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. 2018. Neural adaptive content-aware internet video delivery. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 645--661.

[62]

Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).

Digital Library

[63]

Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. 2018. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2472--2481.

[64]

Zhengdong Zhang and Vivienne Sze. 2017. FAST: A framework to accelerate super-resolution processing on compressed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 19--28.

[65]

Anfu Zhou, Huanhuan Zhang, Guangyuan Su, Leilei Wu, Ruoxuan Ma, Zhen Meng, Xinyu Zhang, Xiufeng Xie, Huadong Ma, and Xiaojiang Chen. 2019. Learning to coordinate video codec with transport protocol for mobile video telephony. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.

Digital Library

[66]

Ce Zhu, Xiao Lin, and Lap-Pui Chau. 2002. Hexagon-based search pattern for fast block motion estimation. IEEE transactions on circuits and systems for video technology 12, 5 (2002), 349--355.

[67]

Shan Zhu and Kai-Kuang Ma. 2000. A new diamond search algorithm for fast block-matching motion estimation. IEEE transactions on Image Processing 9, 2 (2000), 287--290.

Cited By

Chai BWu DChen JYang MWang ZHu M(2025)REM: Enabling Real-Time Neural-Enhanced Video Streaming on Mobile Devices Using Macroblock-Aware Lookup TableIEEE Transactions on Mobile Computing10.1109/TMC.2024.349644324:3(2085-2097)Online publication date: Mar-2025
https://doi.org/10.1109/TMC.2024.3496443
Zhou XZeng JGe SLiu XQiu T(2025)Collaborative Video Streaming With Super-Resolution in Multi-User MEC NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346168524:2(571-584)Online publication date: Feb-2025
https://doi.org/10.1109/TMC.2024.3461685
Nguyen VRyu SKo KYoo JHong J(2025)Muno: Improved Bandwidth Estimation Scheme in Video Conferencing Using Deep Reinforcement LearningInternational Journal of Network Management10.1002/nem.232335:1Online publication date: 8-Jan-2025
https://doi.org/10.1002/nem.2323
Show More Cited By

Index Terms

NEMO: enabling neural-enhanced video streaming on commodity mobile devices

Recommendations

ABUV: Adaptive bitrate and upsampling for video streaming on mobile devices
Abstract
Fueled by the popularity of mobile devices, mobile channels have become the preferred video delivery medium. However, users often encounter a poor quality of experience (QoE) due to bandwidth limitations, despite the implementation of adaptive ...
OASIS: Collaborative Neural-Enhanced Mobile Video Streaming
MMSys '24: Proceedings of the 15th ACM Multimedia Systems Conference

Neural-enhanced video streaming (e.g., super-resolution) is an ongoing revolution which can provide extremely high-quality video streaming services breaking the restriction of bandwidth. However, such enhancements require intense computation power that ...
MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors
MobiCom '19: The 25th Annual International Conference on Mobile Computing and Networking

In recent years, convolutional networks have demonstrated unprecedented performance in the image restoration task of super-resolution (SR). SR entails the upscaling of a single low-resolution image in order to meet application-specific image quality ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking

April 2020

621 pages

ISBN:9781450370851

DOI:10.1145/3372224

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Institute for Information & communications Technology Promotion (IITP)

Conference

MobiCom '20

Sponsor:

SIGMOBILE

MobiCom '20: The 26th Annual International Conference on Mobile Computing and Networking

September 21 - 25, 2020

London, United Kingdom

Acceptance Rates

Overall Acceptance Rate 440 of 2,972 submissions, 15%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

88
Total Citations
View Citations
2,649
Total Downloads

Downloads (Last 12 months)340
Downloads (Last 6 weeks)32

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chai BWu DChen JYang MWang ZHu M(2025)REM: Enabling Real-Time Neural-Enhanced Video Streaming on Mobile Devices Using Macroblock-Aware Lookup TableIEEE Transactions on Mobile Computing10.1109/TMC.2024.349644324:3(2085-2097)Online publication date: Mar-2025
https://doi.org/10.1109/TMC.2024.3496443
Zhou XZeng JGe SLiu XQiu T(2025)Collaborative Video Streaming With Super-Resolution in Multi-User MEC NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346168524:2(571-584)Online publication date: Feb-2025
https://doi.org/10.1109/TMC.2024.3461685
Nguyen VRyu SKo KYoo JHong J(2025)Muno: Improved Bandwidth Estimation Scheme in Video Conferencing Using Deep Reinforcement LearningInternational Journal of Network Management10.1002/nem.232335:1Online publication date: 8-Jan-2025
https://doi.org/10.1002/nem.2323
Sivaraman VKarimi PVenkatapathy VKhani MFouladi SAlizadeh MDurand FSze VVanbever LZhang I(2024)GeminoProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691857(569-590)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691857
He ZYang YLi SQiu LDai DYang Y(2024)VIGOR: Reviving Cloud Gaming SessionsProceedings of the ACM on Networking10.1145/36964032:CoNEXT4(1-20)Online publication date: 25-Nov-2024
https://dl.acm.org/doi/10.1145/3696403
Zhu SVoigt TRahimian FKo J(2024)On-device Training: A First Overview on Existing SystemsACM Transactions on Sensor Networks10.1145/369600320:6(1-39)Online publication date: 14-Sep-2024
https://dl.acm.org/doi/10.1145/3696003
Siam SAhn HLiu LAlam SShen HCao ZShroff NKrishnamachari BSrivastava MZhang M(2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/369063921:1(1-75)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1145/3690639
Yu EZhou JLi ZTyson GLi WZhang XXu ZXie G(2024)Mustang: Improving QoE for Real-Time Video in Cellular Networks by Masking JitterACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239920:9(1-23)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3672399
Wang HZhang RLi CXue ZPeng YPang XZhang YRen SShi S(2024)Twist: A Multi-site Transmission Solution for On-demand Video StreamingProceedings of the ACM on Networking10.1145/36562972:CoNEXT2(1-19)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3656297
Wang LSingh SChakareski JHajiesmaili MSitaraman R(2024)BONES: Near-Optimal Neural-Enhanced Video StreamingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560148:2(1-28)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656014
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten