skip to main content
10.1145/3372224.3419185acmconferencesArticle/Chapter ViewAbstractPublication PagesmobicomConference Proceedingsconference-collections
research-article

NEMO: enabling neural-enhanced video streaming on commodity mobile devices

Published: 18 September 2020 Publication History

Abstract

The demand for mobile video streaming has experienced tremendous growth over the last decade. However, existing methods of video delivery fall short of delivering high-quality video. Recent advances in neural super-resolution have opened up the possibility of enhancing video quality by leveraging client-side computation. Unfortunately, mobile devices cannot benefit from this because it is too expensive in computation and power-hungry.
To overcome the limitation, we present NEMO, a system that enables real-time video super-resolution on mobile devices. NEMO applies neural super-resolution to a few select frames and transfers the outputs to benefit the remaining frames. The frames to which super-resolution is applied are carefully chosen to maximize the overall quality gains. NEMO leverages fine-grained dependencies using information from the video codec and strives to provide guarantees in the quality degradation compared to per-frame super-resolution. Our evaluation using a full system implementation on Android shows NEMO improves the overall processing throughput by x11.5, reduces energy consumption by 88.6%, and maintains device temperatures at acceptable levels compared to per-frame super-resolution, while ensuring high video quality. Overall, this leads to a 31.2% improvement in quality of experience for mobile users.

References

[1]
3G Specification. https://www.etsi.org/technologies/mobile/3g/
[2]
4G Specification. https://www.etsi.org/technologies/mobile/4g/
[3]
5G Specification. https://www.etsi.org/technologies/mobile/5g/
[4]
Akamai Mobile CDN. https://www.akamai.com/us/en/resources/mobile-cdn.jsp.
[5]
Amazon Device Farm Official Website. https://aws.amazon.com/device-farm/?nc1=h_ls.
[6]
Apple HLS Specification. https://developer.apple.com/streaming/.
[7]
ARM Neon Overview. https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.
[8]
AV1 Specification. https://aomedia.org/av1-features/get-started/.
[9]
Big Buck Bunny. https://peach.blender.org/.
[10]
Brandwatch Article about "54 Fascinating and Incredible YouTube Statistics". https://www.brandwatch.com/blog/youtube-stats/.
[11]
Brightcove Report about "Q3 2019 Brightcove Global Video Index". https://www.brightcove.com/en/video-index/.
[12]
Draft VP9 Bitstream and Decoding Process Specification. https://www.webmproject.org/vp9/
[13]
FLIR Official Website. https://www.flir.com/.
[14]
Google's Exoplayer Official Website. https://developer.android.com/guide/topics/media/exoplayer.
[15]
Google's libvpx Official Github Repository. https://github.com/webmproject/libvpx/.
[16]
H.264 : Advanced video coding for generic audiovisual services. https://www.itu.int/rec/T-REC-H.264-200305-S/en
[17]
H.264 Specification. https://www.itu.int/rec/T-REC-H.264.
[18]
H.265 Specification. https://www.itu.int/rec/T-REC-H.265.
[19]
LG GPad5 Specifications. https://www.gsmarena.com/lg_g_pad_5_10_1-9952.php.
[20]
Medium report about "Top 10 Most Popular Types of Videos on YouTube". https://mag.octoly.com/here-are-the-top-10-most-popular-types-of-videos-on-youtube-4ea1e1a192ac.
[21]
Monsoon Official Website. https://www.msoon.com/.
[22]
MPEG-DASH Specification. https://dashif.org/.
[23]
NEMO's official Github Repository. https://github.com/kaist-ina/nemo.
[24]
Qualcomm Article about the Performance of Tensorflow on Mobiles. https://www.qualcomm.com/news/onq/2017/01/09/tensorflow-machine-learning-now-optimized-snapdragon-835-and-hexagon-682-dsp.
[25]
Qualcomm Snapdragon Neural Processing Engine Official Website. https://developer.qualcomm.com/docs/snpe/index.html.
[26]
Statista Report about "Mobile Share of Global Digital Video Plays from 3rd Quarter 2013 to 2nd Quarter 2018". https://www.statista.com/statistics/444318/mobile-device-video-views-share/.
[27]
VVC Specification. https://mpeg.chiariglione.org/standards/mpeg-i/versatile-video-coding.
[28]
Webm Official Website. https://www.webmproject.org/.
[29]
Wowza CDN for Mobile Video Streaming. https://www.wowza.com/docs/using-wowza-cdn-with-wowza-streaming-engine.
[30]
Wowza's DASH bitrate recommendation. https://www.wowza.com/docs/how-to-encode-source-video-for-wowza-streaming-cloud.
[31]
Xiaomi Mi9 Specifications. https://www.gsmarena.com/xiaomi_mi_9-9507.php.
[32]
Xiaomi Redmi Note7 Specifications. https://www.gsmarena.com/xiaomi_redmi_note_7-9513.php.
[33]
YouTube dataset (Education 1). https://www.youtube.com/watch?v=0eaf6bUMd4U.
[34]
YouTube dataset (Unboxing). https://www.youtube.com/watch?v=l0DoQYGZt8M.
[35]
Ghufran Baig, Jian He, Mubashir Adnan Qureshi, Lili Qiu, Guohai Chen, Peng Chen, and Yinliang Hu. 2019. Jigsaw: Robust live 4k video streaming. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.
[36]
Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica, and Hui Zhang. 2013. Developing a predictive model of quality of experience for internet video. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 339--350.
[37]
Jose Caballero, Christian Ledig, Andrew Aitken, Alejandro Acosta, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4778--4787.
[38]
Lukas Cavigelli, Philippe Degen, and Luca Benini. 2017. Cbinfer: Change-based inference for convolutional neural networks on video data. In Proceedings of the 11th International Conference on Distributed Smart Cameras. 1--8.
[39]
Zhibo Chen, Jianfeng Xu, Yun He, and Junli Zheng. 2006. Fast integer-pel and fractional-pel motion estimation for H. 264/AVC. Journal of visual communication and image representation 17, 2 (2006), 264--290.
[40]
M. Dasari, A. Bhattacharya, S. Vargas, P. Sahu, A. Balasubramanian, and S. R. Das. 2020. Streaming 360° Videos using Super-resolution. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM).
[41]
Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115--127.
[42]
Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S Wahby, and Keith Winstein. 2018. Salsify: Low-latency network video through tighter integration between a video codec and a transport protocol. In 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). 267--282.
[43]
Aditya Ganjam, Faisal Siddiqui, Jibin Zhan, Xi Liu, Ion Stoica, Junchen Jiang, Vyas Sekar, and Hui Zhang. 2015. C3: Internet-Scale Control Plane for Video Quality Optimization. In Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI), Vol. 15. 131--144.
[44]
Pan Hu, Rakesh Misra, and Sachin Katti. 2019. Dejavu: Enhancing Videoconferencing with Prior Knowledge. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications. ACM, 63--68.
[45]
Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82--95.
[46]
Junchen Jiang, Vyas Sekar, Henry Milner, Davis Shepherd, Ion Stoica, and Hui Zhang. 2016. {CFA}: A practical prediction system for video qoe optimization. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 137--150.
[47]
Junchen Jiang, Shijie Sun, Vyas Sekar, and Hui Zhang. 2017. Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation. In Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI), Vol. 1. 3.
[48]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision. Springer, 694--711.
[49]
Soowon Kang, Hyeonwoo Choi, Sooyoung Park, Chunjong Park, Jemin Lee, Uichin Lee, and Sung-Ju Lee. 2019. Fire in Your Hands: Understanding Thermal Behavior of Smartphones. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.
[50]
Jaehong Kim, Youngmok Jung, Hyunho Yeo, Juncheol Ye, and Dongsu Han. 2020. Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).
[51]
Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1646--1654.
[52]
JC Lawrence and JP Bull. 1976. Thermal conditions which cause skin burns. Engineering in Medicine 5, 3 (1976), 61--63.
[53]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681--4690.
[54]
Royson Lee, Stylianos I Venieris, Lukasz Dudziak, Sourav Bhattacharya, and Nicholas D Lane. 2019. MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.
[55]
Hongqiang Harry Liu, Ye Wang, Yang Richard Yang, Hao Wang, and Chen Tian. 2012. Optimizing cost and performance for content multihoming. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).
[56]
Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural Adaptive Video Streaming with Pensieve. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM) (Los Angeles, CA, USA). 197--210.
[57]
Matthew K. Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srinivasan Seshan, and Hui Zhang. 2015. Practical, Real-time Centralized Control for CDN-based Live Video Delivery. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM) (London, United Kingdom). 311--324.
[58]
Kevin Spiteri, Rahul Urgaonkar, and Ramesh K Sitaraman. 2016. BOLA: Nearoptimal bitrate adaptation for online videos. In Proceedings of the IEEE International Conference on Computer Communications (INFOCOM). IEEE, 1--9.
[59]
Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Xiaozhu Lin, and Xuanzhe Liu. 2018. DeepCache: Principled cache for mobile deep vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 129--144.
[60]
Hyunho Yeo, Sunghyun Do, and Dongsu Han. 2017. How will Deep Learning Change Internet Video Delivery?. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks. ACM, 57--64.
[61]
Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, and Dongsu Han. 2018. Neural adaptive content-aware internet video delivery. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 645--661.
[62]
Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A control-theoretic approach for dynamic adaptive video streaming over HTTP. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM).
[63]
Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. 2018. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2472--2481.
[64]
Zhengdong Zhang and Vivienne Sze. 2017. FAST: A framework to accelerate super-resolution processing on compressed videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 19--28.
[65]
Anfu Zhou, Huanhuan Zhang, Guangyuan Su, Leilei Wu, Ruoxuan Ma, Zhen Meng, Xinyu Zhang, Xiufeng Xie, Huadong Ma, and Xiaojiang Chen. 2019. Learning to coordinate video codec with transport protocol for mobile video telephony. In The 25th Annual International Conference on Mobile Computing and Networking. 1--16.
[66]
Ce Zhu, Xiao Lin, and Lap-Pui Chau. 2002. Hexagon-based search pattern for fast block motion estimation. IEEE transactions on circuits and systems for video technology 12, 5 (2002), 349--355.
[67]
Shan Zhu and Kai-Kuang Ma. 2000. A new diamond search algorithm for fast block-matching motion estimation. IEEE transactions on Image Processing 9, 2 (2000), 287--290.

Cited By

View all
  • (2025)REM: Enabling Real-Time Neural-Enhanced Video Streaming on Mobile Devices Using Macroblock-Aware Lookup TableIEEE Transactions on Mobile Computing10.1109/TMC.2024.349644324:3(2085-2097)Online publication date: Mar-2025
  • (2025)Collaborative Video Streaming With Super-Resolution in Multi-User MEC NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346168524:2(571-584)Online publication date: Feb-2025
  • (2025)Muno: Improved Bandwidth Estimation Scheme in Video Conferencing Using Deep Reinforcement LearningInternational Journal of Network Management10.1002/nem.232335:1Online publication date: 8-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MobiCom '20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking
April 2020
621 pages
ISBN:9781450370851
DOI:10.1145/3372224
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep neural networks
  2. mobile computing
  3. super-resolution
  4. video codec
  5. video streaming

Qualifiers

  • Research-article

Funding Sources

  • Institute for Information & communications Technology Promotion (IITP)

Conference

MobiCom '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 440 of 2,972 submissions, 15%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)340
  • Downloads (Last 6 weeks)32
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)REM: Enabling Real-Time Neural-Enhanced Video Streaming on Mobile Devices Using Macroblock-Aware Lookup TableIEEE Transactions on Mobile Computing10.1109/TMC.2024.349644324:3(2085-2097)Online publication date: Mar-2025
  • (2025)Collaborative Video Streaming With Super-Resolution in Multi-User MEC NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346168524:2(571-584)Online publication date: Feb-2025
  • (2025)Muno: Improved Bandwidth Estimation Scheme in Video Conferencing Using Deep Reinforcement LearningInternational Journal of Network Management10.1002/nem.232335:1Online publication date: 8-Jan-2025
  • (2024)GeminoProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691857(569-590)Online publication date: 16-Apr-2024
  • (2024)VIGOR: Reviving Cloud Gaming SessionsProceedings of the ACM on Networking10.1145/36964032:CoNEXT4(1-20)Online publication date: 25-Nov-2024
  • (2024)On-device Training: A First Overview on Existing SystemsACM Transactions on Sensor Networks10.1145/369600320:6(1-39)Online publication date: 14-Sep-2024
  • (2024)Artificial Intelligence of Things: A SurveyACM Transactions on Sensor Networks10.1145/369063921:1(1-75)Online publication date: 30-Aug-2024
  • (2024)Mustang: Improving QoE for Real-Time Video in Cellular Networks by Masking JitterACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367239920:9(1-23)Online publication date: 10-Jun-2024
  • (2024)Twist: A Multi-site Transmission Solution for On-demand Video StreamingProceedings of the ACM on Networking10.1145/36562972:CoNEXT2(1-19)Online publication date: 13-Jun-2024
  • (2024)BONES: Near-Optimal Neural-Enhanced Video StreamingProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560148:2(1-28)Online publication date: 29-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media