ABSTRACT
Over-the-top (OTT) streaming services like YouTube and Netflix induce massive amounts of video traffic. To combat the resulting network load, this article empirically explores the use of the object-based video (OBV) methodology that allows for the quality-variant HTTP Adaptive Streaming of respectively the background and foreground object(s) of a video scene. In particular, we study two alternative video object representation methods where the first meticulously follows the object contour, while the second uses axis-aligned bounding box enclosures. We subjectively compare both techniques to traditional, frame-based video compression in the context of live action content featuring talking persons. The resulting mixed methods data shows that (i) OBV-informed users tolerate substantial background quality degradations, and (ii) at an average bitrate reduction of 14 percent, perceptual differences between respectively contour-based OBV and traditional encoding are small or even non-existing for the non-movie content in our corpus. Although our evaluation focuses on interview-like footage, our qualitative data hints that the presented results might be extrapolatable to other video genres. As such, our findings inform content owners and network operators about video bitrate saving opportunities with marginal perceptual impact.
Supplemental Material
Available for Download
Auxiliary video (H.264 / AVC video codec, AAC audio codec) for our ACM MM2019 paper with title "Talking Video Heads - Saving Streaming Bitrate by Adaptively Applying Object-based Video Principles to Interview-like Footage".
- Mike Armstrong, Matthew Brooks, Anthony Churnside, Michael Evans, Frank Melchior, and Matthew Shotton. 2014. Object-based broadcasting - Curation, responsiveness and user experience. In Proceedings of the IBC2014 Conference . https://doi.org/10.1049/ib.2014.0038Google ScholarCross Ref
- Robert B Goldstein, Russell Woods, and Eli Peli. 2007. Where people look when watching movies: Do all viewers look at the same place? Computers in Biology and Medicine , Vol. 37, 7 (08 2007), 957--964. https://doi.org/10.1016/j.compbiomed.2006.08.018Google ScholarDigital Library
- Bitmovin. 2018. Video Developer Report 2018. Online, https://go.bitmovin.com/hubfs/Bitmovin-Video-Developer-Report-2018.pdf.Google Scholar
- Giuseppe Boccignone , Angelo Marcelli, Paolo Napoletano , Gianluca Di Fiore, Giovanni Iacovoni , and Salvatore Morsa. 2008. Bayesian Integration of Face and Low-Level Cues for Foveated Video Coding. IEEE Transactions on Circuits and Systems for Video Technology , Vol. 18, 12 (December 2008), 1727--1740. https://doi.org/10.1109/TCSVT.2008.2005798Google Scholar
- Andrea Cavallaro , Olivier Steiger, and Touradj Ebrahimi. 2005. Semantic video analysis for adaptive content delivery and automatic description. IEEE Transactions on Circuits and Systems for Video Technology , Vol. 15, 10 (October 2005), 1200--1209.Google Scholar
- Cisco. 2019. Visual Networking Index: Forecast and Trends, 2017 - 2022. Online, https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11--741490.html.Google Scholar
- Bogdan Ciubotaru , Gabriel-Miro Muntean, and Gheorghita Ghinea. 2009. Objective Assessment of Region of Interest-Aware Adaptive Multimedia Streaming Quality. IEEE Transactions on Broadcasting , Vol. 55, 2 (June 2009), 202--212. https://doi.org/10.1109/TBC.2009.2020448Google Scholar
- Xavier Corbillon, Alisa Devlic, Gwendal Simon, and Jacob Chakareski. 2017. Optimal Set of 360-Degree Videos for Viewport-Adaptive Streaming. In Proceedings of the 25th ACM International Conference on Multimedia (MM '17). ACM, 943--951. https://doi.org/10.1145/3123266.3123372Google ScholarDigital Library
- Jasmine Cox, Rhianne Jones, Chris Northwood, Jonathan Tutcher, and Ben Robinson. 2017. Object-Based Production: A Personalised Interactive Cooking Application. In Adjunct Publication of the 2017 ACM International Conference on Interactive Experiences for TV and Online Video (TVX '17 Adjunct). ACM, 79--80. https://doi.org/10.1145/3084289.3089912Google Scholar
- Lucia D'Acunto, Jorrit van den Berg, Emmanuel Thomas, and Omar Niamut. 2016. Using MPEG DASH SRD for Zoomable and Navigable Video. In Proceedings of the 7th ACM Multimedia Systems Conference (MMSys '16). ACM, Article 34, bibinfonumpages4 pages. https://doi.org/10.1145/2910017.2910634Google Scholar
- Michael Dorr, Thomas Martinetz, Karl R. Gegenfurtner, and Erhardt Barth. 2010. Variability of eye movements when viewing dynamic natural scenes . Journal of Vision , Vol. 10, 10 (August 2010), 1--17. https://doi.org/10.1167/10.10.28Google ScholarCross Ref
- Alexandros Eleftheriadis and Arnaud Jacquin. 1995. Automatic face location detection and tracking for model-assisted coding of video teleconferencing sequences at low bit-rates. Signal Processing: Image Communication , Vol. 7, 3 (1995), 231--248. https://doi.org/10.1016/0923--5965(95)00028-UGoogle ScholarCross Ref
- Ulrich Engelke, Hagen Kaprykowsky, Hans-Jürgen Zepernick, and Patrick Ndjiki-Nya. 2011. Visual Attention in Quality Assessment. IEEE Signal Processing Magazine , Vol. 28, 6 (November 2011), 50--59. https://doi.org/10.1109/MSP.2011.942473Google ScholarCross Ref
- Ulrich Engelke, Romuald Pépion, Patrick Le Callet, and Hans-Jürgen Zepernick. 2010. Linking Distortion Perception and Visual Saliency in H.264/AVC Coded Video Containing Packet Loss. In Proceedings of Visual Communications and Image Processing (VCIP 2010). https://doi.org/10.1117/12.863508Google ScholarCross Ref
- Michael Evans, Tristan Ferne, Zillah Watson, Frank Melchior, Matthew Brooks, Phil Stenton, Ian Forrester, and Chris Baume. 2017. Creating Object-Based Experiences in the Real World. SMPTE Motion Imaging Journal , Vol. 126, 6 (August 2017), 1--7. https://doi.org/10.5594/JMI.2017.2709859Google ScholarCross Ref
- FFmpeg. 2019. Encode/H.264. Online, https://trac.ffmpeg.org/wiki/Encode/H.264.Google Scholar
- Marsha E. Fonteyn, Benjamin Kuipers, and Susan J. Grobe. 1993. A Description of Think Aloud Method and Protocol Analysis. Qualitative Health Research , Vol. 3, 4 (November 1993), 430--441. https://doi.org/10.1177/104973239300300403Google ScholarCross Ref
- Thomas Forgione, Axel Carlier, Géraldine Morin, Wei Tsang Ooi, Vincent Charvillat, and Praveen Kumar Yadav. 2018. An Implementation of a DASH Client for Browsing Networked Virtual Environment. In Proceedings of the 26th ACM International Conference on Multimedia (MM '18). ACM, 1263--1264. https://doi.org/10.1145/3240508.3241398Google ScholarDigital Library
- Rafael C. Gonzalez and Richard E. Woods. 2018. Digital Image Processing, 4th Edition .Pearson.Google Scholar
- Stefan A. Goor and Liam Murphy. 2003. An Adaptive MPEG-4 Streaming System Based on Object Prioritisation. In Proceedings of Irish Signals and Systems Conference. IEEE.Google Scholar
- Adrian Gradinar, Daniel Burnett, Paul Coulton, Ian Forrester, Matt Watkins, Tom Scutt, and Emma Murphy. 2015. Perceptive Media - Adaptive Storytelling for Digital Broadcast. In Proceedings of the 15th IFIP TC13 International Conference on Human-Computer Interaction (INTERACT 2015). Springer, Cham, 586--589. https://doi.org/10.1007/978--3--319--22723--8_67Google ScholarCross Ref
- Mario Graf, Christian Timmerer, and Christopher Mueller. 2017. Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP: Design, Implementation, and Evaluation. In Proceedings of the 8th ACM Multimedia Systems Conference (MMSys'17). ACM, 261--271. https://doi.org/10.1145/3083187.3084016Google ScholarDigital Library
- Asaad Hakeem, Khurram Shafique, and Mubarak Shah. 2005. An Object-based Video Coding Framework for Video Sequences Obtained from Static Cameras. In Proceedings of the 13th ACM International Conference on Multimedia (MM '05). ACM, 608--617. https://doi.org/10.1145/1101149.1101289Google ScholarDigital Library
- David Hasler and Sabine Süsstrunk. 2003. Measuring colorfulness in natural images. In Proceedings of IS&T/SPIE Electronic Imaging: Human Vision and Electronic Imaging VIII, Vol. 5007. 87--95. https://doi.org/10.1117/12.477378Google ScholarCross Ref
- Ivan Himawan, Wei Song, and Dian Tjondronegoro. 2017. Impact of automatic region-of-interest coding on perceived quality in mobile video. Multimedia Tools and Applications , Vol. 76, 1 (January 2017), 785--813. https://doi.org/10.1007/s11042-015--3054-yGoogle ScholarDigital Library
- Sture Holm. 1979. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics , Vol. 6, 2 (1979), 65--70. http://www.jstor.org/stable/4615733Google Scholar
- Ming-Ho Hsiao, Hui-Ping Kuo, Hui-Chun Wu, Yu-Kai Chen, and Suh-Yin Lee. 2004. Object-based video streaming technique with application to intelligent transportation systems. In IEEE International Conference on Networking, Sensing and Control, Vol. 1. 315--320. https://doi.org/10.1109/ICNSC.2004.1297455Google ScholarCross Ref
- ISO/IEC 14496--2:1999. 1999. Information technology -- Coding of audio-visual objects -- Part 2: Visual .Google Scholar
- ISO/IEC 23009--1. 2014. Information technology -- Dynamic adaptive streaming over HTTP (DASH) -- Part 1: Media presentation description and segment formats .Google Scholar
- ITU-R BT.500. 2012. Methodology for the subjective assessment of the quality of television pictures. Online, https://www.itu.int/rec/R-REC-BT.500--13--201201-I/en.Google Scholar
- ITU-T P.800.2. 2016. Mean opinion score interpretation and reporting. Online, https://www.itu.int/rec/T-REC-P.800.2--201607-I/en.Google Scholar
- ITU-T P.910. 2008. Subjective video quality assessment methods for multimedia applications. Online, https://www.itu.int/rec/T-REC-P.910--200804-I/en.Google Scholar
- ITU-T P.913. 2016. Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment. Online, https://www.itu.int/rec/T-REC-P.913/en.Google Scholar
- Peter A. Kara, Aron Cserkaszky, Attila Barsi, Maria G. Martini, and Tibor Balogh. 2017. Towards Adaptive Light Field Video Streaming . IEEE COMSOC MMTC Communications - Frontiers 4 (July 2017), 50--55.Google Scholar
- Jong-Seok Lee, Francesca De Simone, and Touradj Ebrahimi. 2009. Influence of audio-visual attention on perceived quality of standard definition multimedia content. In Proceedings of the International Workshop on Quality of Multimedia Experience (QoMEX 2009). IEEE, 13--18. https://doi.org/10.1109/QOMEX.2009.5246983Google ScholarCross Ref
- Jie Li, Thomas Röggla, Maxine Glancy, Jack Jansen, and Pablo Cesar. 2018b. A New Production Platform for Authoring Object-based Multiscreen TV Viewing Experiences. In Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video (TVX '18). ACM, 115--126. https://doi.org/10.1145/3210825.3210834Google ScholarDigital Library
- Zhi Li, Christos Bampis, Julie Novak, Anne Aaron, Kyle Swanson, Anush Moorthy, and Jan De Cock. 2018a. VMAF: The Journey Continues. Online, https://medium.com/netflix-techblog/vmaf-the-journey-continues-44b51ee9ed12.Google Scholar
- Zhicheng Li, Shiyin Qin, and Laurent Itti. 2011. Visual Attention Guided Bit Allocation in Video Compression. Image and Vision Computing , Vol. 29, 1 (January 2011), 1--14. https://doi.org/10.1016/j.imavis.2010.07.001Google ScholarDigital Library
- Claire Mantel, Thomas Kunlin, and Patricia Ladret. 2010. The role of temporal aspects for quality assessment. In Proceedings of the Second International Workshop on Quality of Multimedia Experience (QoMEX 2010). IEEE, 94--99. https://doi.org/10.1109/QOMEX.2010.5517868Google ScholarCross Ref
- Britta Meixner. 2017. Hypervideos and Interactive Multimedia Presentations. Comput. Surveys , Vol. 50, 1, Article 9 (April 2017), bibinfonumpages34 pages. https://doi.org/10.1145/3038925Google ScholarDigital Library
- Kiran Misra, Andrew Segall, Michael Horowitz, Shilin Xu, Arild Fuldseth, and Minhua Zhou. 2013. An Overview of Tiles in HEVC. IEEE Journal of Selected Topics in Signal Processing , Vol. 7, 6 (December 2013), 969--977. https://doi.org/10.1109/JSTSP.2013.2271451Google ScholarCross Ref
- Alexandre Ninassi , Olivier Le Meur, Patrick Le Callet , and Dominique Barba. 2009. Considering Temporal Variations of Spatial Visual Distortions in Video Quality Assessment. IEEE Journal of Selected Topics in Signal Processing , Vol. 3, 2 (April 2009), 253--265. https://doi.org/10.1109/JSTSP.2009.2014806Google Scholar
- Marcus Nyström and Kenneth Holmqvist. 2010. Effect of Compressed Offline Foveated Video on Viewing Behavior and Subjective Quality. ACM Transactions on Multimedia Computing, Communications and Applications , Vol. 6, 1, Article 4 (Feb. 2010), bibinfonumpages14 pages. https://doi.org/10.1145/1671954.1671958Google ScholarDigital Library
- Federico Perazzi, Jordi Pont-Tuset, Brian McWilliams, Luc V. Gool, Markus Gross, and Alexander Sorkine-Hornung. 2016. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). IEEE, 724--732. https://doi.org/10.1109/CVPR.2016.85Google ScholarCross Ref
- Peter Quax, Panagiotis Issaris, Wouter Vanmontfort, and Wim Lamotte. 2012. Evaluation of Distribution of Panoramic Video Sequences in the eXplorative Television Project. In 22nd International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV '12). ACM, 45--50. https://doi.org/10.1145/2229087.2229100Google ScholarDigital Library
- Judith Redi, Lucia D'Acunto, and Omar Niamut. 2015. Interactive UHDTV at the Commonwealth Games: An Explorative Evaluation. In Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video (TVX '15). ACM, 43--52. https://doi.org/10.1145/2745197.2745203Google ScholarDigital Library
- Patrice Rondao Alface, Maarten Aerts, Donny Tytgat, Sammy Lievens, Christoph Stevens, Nico Verzijp, and Jean-Francois Macq. 2017. 16K Cinematic VR Streaming. In Proceedings of the 25th ACM International Conference on Multimedia (MM '17). ACM, 1105--1112. https://doi.org/10.1145/3123266.3123307Google ScholarDigital Library
- Sandvine. 2018. Global Internet Phenomena Report . Online, https://www.sandvine.com/2018-internet-phenomena-report.Google Scholar
- Wei Song, Dian Tjondronegoro, and Michael Docherty. 2011. Saving Bitrate vs. Pleasing Users: Where is the Break-even Point in Mobile Video Quality?. In Proceedings of the 19th ACM International Conference on Multimedia (MM '11). ACM, 403--412. https://doi.org/10.1145/2072298.2072351Google ScholarDigital Library
- Wei Song, Dian W. Tjondronegoro, Shu-Hsien Wang, and Michael J. Docherty. 2010. Impact of Zooming and Enhancing Region of Interests for Optimizing User Experience on Mobile Sports Video. In Proceedings of the 18th ACM International Conference on Multimedia (MM '10). ACM, 321--330. https://doi.org/10.1145/1873951.1873996Google Scholar
- Anselm Strauss and Juliet M. Corbin. 1997. Grounded Theory in Practice .SAGE Publications, Inc.Google Scholar
- Meijun Sun, Ziqi Zhou, Dong Zhang, and Zheng Wang. 2018. Hybrid convolutional neural networks and optical flow for video visual attention prediction. Multimedia Tools and Applications , Vol. 77, 22 (November 2018), 29231--29244. https://doi.org/10.1007/s11042-018--5793-zGoogle ScholarCross Ref
- Marian F. Ursu, Ian C. Kegel, Doug Williams, Maureen Thomas, Harald Mayer, Vilmos Zsombori, and Mika L. Tuomola. 2008. ShapeShifting TV: interactive screen media narratives. Multimedia Systems , Vol. 14, 2 (July 2008), 115--132. https://doi.org/10.1007/s00530-008-0119-zGoogle ScholarDigital Library
- Niels Van Kets, Johan De Praeter, Glenn Van Wallendael, Jan De Cock, and Rik Van de Walle. 2015. Fast Encoding for Personalized Views Extracted from Beyond High Definition Content. In Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB 2015). 1--7. https://doi.org/10.1109/BMSB.2015.7177225Google ScholarCross Ref
- Anthony Vetro and Huifang Sun. 2001. An overview of MPEG-4 object-based encoding algorithms. In Proceedings of the International Conference on Information Technology: Coding and Computing . 366--369. https://doi.org/10.1109/ITCC.2001.918823Google ScholarCross Ref
- Anthony Vetro, Huifang Sun, and Yao Wang. 2001. Object-based transcoding for adaptable video content delivery . IEEE Transactions on Circuits and Systems for Video Technology , Vol. 11, 3 (March 2001), 387--401. https://doi.org/10.1109/76.911163Google ScholarDigital Library
- Hui Wang, Vu-Thanh Nguyen, Wei Tsang Ooi, and Mun Choon Chan. 2014. Mixing Tile Resolutions in Tiled Video: A Perceptual Quality Assessment. In 24th International Workshop on Network and Operating System Support on Digital Audio and Video (NOSSDAV '14). ACM, Article 25, bibinfonumpages6 pages. https://doi.org/10.1145/2578260.2578267Google ScholarDigital Library
- Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing , Vol. 13, 4 (April 2004), 600--612. https://doi.org/10.1109/TIP.2003.819861Google ScholarDigital Library
- Maarten Wijnants, Sven Coppers, Gustavo Rovelo Ruiz, Peter Quax, and Wim Lamotte. 2019. Split & Dual Screen Comparison of Classic vs Object-based Video. In Proceedings of the 27th ACM International Conference on Multimedia (MM '19, to appear). ACM.Google ScholarDigital Library
- Maarten Wijnants, Tom Jehaes, Peter Quax, and Wim Lamotte. 2008. Efficient Transmission of Rendering-related Data Using the NIProxy. In Proceedings of the IASTED International Conference on Internet and Multimedia Systems and Applications (EuroIMSA '08). ACTA Press, 162--169. http://dl.acm.org/citation.cfm?id=1713318.1713349Google ScholarDigital Library
- Maarten Wijnants, Hendrik Lievens, Nick Michiels, Jeroen Put, Peter Quax, and Wim Lamotte. 2018. Standards-compliant HTTP Adaptive Streaming of Static Light Fields. In Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology (VRST '18). ACM, Article 4, bibinfonumpages12 pages. https://doi.org/10.1145/3281505.3281539Google ScholarDigital Library
- Maarten Wijnants, Gustavo Rovelo, Peter Quax, and Wim Lamotte. 2016. A Pragmatically Designed Adaptive and Web-compliant Object-based Video Streaming Methodology: Implementation and Subjective Evaluation. In Proceedings of the 24th ACM International Conference on Multimedia (MM '16). ACM, 1267--1276. https://doi.org/10.1145/2964284.2964300Google ScholarDigital Library
- Markos Zampoglou, Kostas Kapetanakis, Andreas Stamoulias, Athanasios G. Malamos, and Spyros Panagiotakis. 2018. Adaptive Streaming of Complex Web 3D Scenes based on the MPEG-DASH Standard. Multimedia Tools and Applications , Vol. 77, 1 (January 2018), 125--148. https://doi.org/10.1007/s11042-016--4255--8Google ScholarDigital Library
- Alireza Zare, Alireza Aminlou, Miska M. Hannuksela, and Moncef Gabbouj. 2016. HEVC-compliant Tile-based Streaming of Panoramic Video for Virtual Reality Applications. In Proceedings of the 24th ACM International Conference on Multimedia (MM '16). ACM, 601--605. https://doi.org/10.1145/2964284.2967292Google ScholarDigital Library
- Vilmos Zsombori, Marian Florin Ursu, John Wyver, Ian Kegel, and Doug Williams. 2008. ShapeShifting Documentary: A Golden Age. In Proceedings of the 6th European Conference on Interactive Television (EuroITV 2008). Springer, Berlin, Heidelberg, 40--50.Google ScholarDigital Library
Index Terms
- Talking Video Heads: Saving Streaming Bitrate by Adaptively Applying Object-based Video Principles to Interview-like Footage
Recommendations
Split & Dual Screen Comparison of Classic vs Object-based Video
MM '19: Proceedings of the 27th ACM International Conference on MultimediaOver-the-top (OTT) streaming services like YouTube and Netflix induce massive amounts of video data, hereby putting substantial pressure on network infrastructure. This paper describes a demonstration of the object-based video (OBV) methodology that ...
On Lagrange multiplier and quantizer adjustment for H.264 frame-layer video rate control
H.264/AVC encoder employs a complex mode-decision technique based on rate-distortion optimization. It calculates rate-distortion cost (RDcost) for all possible modes to choose the best one having the minimum RDcost. This paper presents a frame-layer ...
Adaptive intra-refresh for low-delay error-resilient video coding
Low-delay and error-resilient video coding is critical for real-time video communication over wireless networks. Intra-refresh coding, which embeds intra coded regions into inter frames can achieve a relatively smooth bit-rate and terminate the error ...
Comments