skip to main content
research-article

Assessment of Machine Learning-Based Audiovisual Quality Predictors: Why Uncertainty Matters

Published: 21 April 2021 Publication History

Abstract

Quality assessment of audiovisual (AV) signals is important from the perspective of system design, optimization, and management of a modern multimedia communication system. However, automatic prediction of AV quality via the use of computational models remains challenging. In this context, machine learning (ML) appears to be an attractive alternative to the traditional approaches. This is especially when such assessment needs to be made in no-reference (i.e., the original signal is unavailable) fashion. While development of ML-based quality predictors is desirable, we argue that proper assessment and validation of such predictors is also crucial before they can be deployed in practice. To this end, we raise some fundamental questions about the current approach of ML-based model development for AV quality assessment and signal processing for multimedia communication in general. We also identify specific limitations associated with the current validation strategy which have implications on analysis and comparison of ML-based quality predictors. These include a lack of consideration of: (a) data uncertainty, (b) domain knowledge, (c) explicit learning ability of the trained model, and (d) interpretability of the resultant model. Therefore, the primary goal of this article is to shed some light into mentioned factors. Our analysis and proposed recommendations are of particular importance in the light of significant interests in ML methods for multimedia signal processing (specifically in cases where human-labeled data is used), and a lack of discussion of mentioned issues in existing literature.

References

[1]
Z. Akhtar and T. H. Falk. 2017. Audio-visual multimedia quality assessment: A comprehensive survey. IEEE Access 5 (2017), 21090–21117.
[2]
Benjamin Belmudez. 2015. Audiovisual Quality Assessment and Prediction for Videotelephony.
[3]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 1 (Feb. 2012), 281–305. http://dl.acm.org/citation.cfm?id=2503308.2188395
[4]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin,.
[5]
Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (Jul 2019), 832.
[6]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). Association for Computing Machinery, New York, 785–794.
[7]
Tasos Dagiuklas, Raimund Schatz, Pedro Assuncao, and Luigi Atzori. 2017. Editorial: Special issue on “QoE monitoring and management for future internet media services”. Multimedia Tools and Applications 76, 21 (01 Nov 2017), 22213–22214.
[8]
Edip Demirbilek and Jean-Charles Grégoire. 2018. Perceived audiovisual quality modelling based on decison trees, genetic programming and neural networks. CoRR abs/1801.05889 (2018). arxiv:1801.05889http://arxiv.org/abs/1801.05889
[9]
Edip Demirbilek and Jean-Charles Grégoire. 2016. INRS audiovisual quality dataset. In Proceedings of the 24th ACM International Conference on Multimedia (MM’16). ACM, New York, NY, USA, 167–171.
[10]
Edip Demirbilek and Jean-Charles Grégoire. 2017. Machine learning-based parametric audiovisual quality prediction models for real-time communications. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2, Article 16 (March 2017), 25 pages.
[11]
E. Demirbilek and J. Grégoire. 2017. Machine learning based reduced reference bitstream audiovisual quality prediction models for realtime communications. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME). 571–576.
[12]
M. Garcia, P. List, S. Argyropoulos, D. Lindegren, M. Pettersson, B. Feiten, J. Gustafsson, and A. Raake. 2013. Parametric model for audiovisual quality assessment in IPTV: ITU-T Rec. P.1201.2. In Proceedings of the 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP). 482–487.
[13]
Marie-Neige Garcia, Robert Schleicher, and Alexander Raake. 2011. Impairment-factor-based audiovisual quality model for IPTV: Influence of video resolution, degradation type, and content type. EURASIP J. Image and Video Processing 2011 (2011).
[14]
Marie-Neige Garcia. 2016. Parametric Packet-based Audiovisual Quality Model for IPTV Services (1st ed.). Springer Publishing Company, Inc.
[15]
Marie-Neige Garcia, Peter Listy, Bernhard Feiteny, Ulf Wustenhageny, and Alexander Raake. 2016. Audio-video databases for H.264-bitstream-based quality assessment of IPTV services. In Proceedings of the 2016 IEEE International Conference Quality of Multimedia Experience. qomex2016.itec.aau.at/index.php/short-papers/
[16]
M. N. Garcia, A. Raake, and B. Feiten. 2013. Parametric audio quality model for IPTV services - ITU-T P.1201.2 audio. In Proceedings of the 2013 5th International Workshop on Quality of Multimedia Experience (QoMEX). 194–199.
[17]
P. Gastaldo, S. Rovetta, and R. Zunino. 2002. Objective quality assessment of MPEG-2 video streams by using CBP neural networks. IEEE Transactions on Neural Networks 13, 4 (July 2002), 939–947.
[18]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.
[19]
Abdelwahab Hamam, Abdulmotaleb El Saddik, and Jihad Alja’am. 2014. A quality of experience model for haptic virtual environments. ACM Trans. Multimedia Comput. Commun. Appl. 10, 3, Article 28 (April 2014), 23 pages.
[20]
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd ed.). Springer.
[21]
Shin ichiro Iwamiya. 1994.Interactions between auditory and visual processing when listening to music in an audiovisual context: 1. Matching 2. Audio quality.
[22]
ITU-T Recommendation G.1070. 2018. Opinion Model for Video-telephony Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.
[23]
ITU-T Recommendation G.1071. 2016. Opinion Model for Network Planning of Video and Audio Streaming Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.
[24]
ITU-T Recommendation P.1201. 2012. Parametric Non-intrusive Assessment of Audiovisual Media Streaming Quality. Technical Report. International Telecommunication Union, Geneva, Switzerland.
[25]
Baris Konuk, Emin Zerman, Gokce Nur Yilmaz, and Gozde Akar. 2016. Video content analysis method for audiovisual quality assessment. 1–6.
[26]
Helard A. Becerra Martinez and Mylène C. Q. Farias. 2018. Combining audio and video metrics to assess audio-visual quality. Multimedia Tools and Applications 77, 18 (01 Sep 2018), 23993–24012.
[27]
Helard Becerra Martinez, Mylène C. Q. Farias, and Andrew Hines. 2019. NAViDAd: A no-reference audio-visual quality metric based on a deep autoencoder. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO 2019), (A Coruña, Spain, September 2-6, 2019). IEEE, 1–5.
[28]
Helard Becerra Martinez, Andrew Hines, and Mylène C. Q. Farias. 2020. How deep is your encoder: An analysis of features descriptors for an autoencoder-based audio-visual quality metric. In Proceedings of the 12th International Conference on Quality of Multimedia Experience (QoMEX 2020) (Athlone, Ireland, May 26-28, 2020). IEEE, 1–6.
[29]
Mansfield Merriman. 1877. On the history of the method of least squares. The Analyst 4, 2 (1877), 33–36. http://www.jstor.org/stable 2635472
[30]
Decebal Mocanu, Jeevan Pokhrel, Juan Pablo Garella, Janne Sepp nen, Eirini Liotou, and Manish Narwaria. 2015. No-reference video quality measurement: Added value of machine learning. Journal of Electronic Imaging 24 (12 2015), 061208.
[31]
S. Möller, B. Belmudez, M. Garcia, C. Kühnel, A. Raake, and B. Weiss. 2010. Audiovisual quality integration: Comparison of human-human and human-machine interaction scenarios of different interactivity. In Proceedings of the 2010 2nd International Workshop on Quality of Multimedia Experience (QoMEX). 58–63.
[32]
M. Narwaria. 2018. Toward better statistical validation of machine learning-based multimedia quality estimators. IEEE Transactions on Broadcasting 64, 2 (June 2018), 446–460.
[33]
M. Narwaria and W. Lin. 2010. Objective image quality assessment based on support vector regression. IEEE Transactions on Neural Networks 21, 3 (March 2010), 515–519.
[34]
J. Nightingale, P. Salva-Garcia, J. M. A. Calero, and Q. Wang. 2018. 5G-QoE: QoE modelling for ultra-HD video streaming in 5G networks. IEEE Transactions on Broadcasting 64, 2 (June 2018), 621–634.
[35]
Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, and Tetsuya Ogata. 2015. Audio-visual speech recognition using deep learning. Applied Intelligence 42, 4 (01 Jun 2015), 722–737.
[36]
K. Pearson. 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London Series A 187 (1896), 253–318.
[37]
Stefano Petrangeli, Jeroen Van Der Hooft, Tim Wauters, and Filip De Turck. 2018. Quality of experience-centric management of adaptive video streaming services: Status and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 31 (May 2018), 29 pages.
[38]
M. H. Pinson, W. Ingram, and A. Webster. 2011. Audiovisual quality components. IEEE Signal Processing Magazine 28, 6 (Nov. 2011), 60–67.
[39]
W. Robitza, M. N. Garcia, and A. Raake. 2015. At home in the lab: Assessing audiovisual quality of HTTP-based adaptive streaming with an immersive test paradigm. In Proceedings of the 2015 7th International Workshop on Quality of Multimedia Experience (QoMEX). 1–6.
[40]
George G. Roussas. 2003. An Introduction to Probability and Statistical Inference. Elsevier.
[41]
Matti Siekkinen, Teemu Kämäräinen, Leonardo Favario, and Enrico Masala. 2018. Can you see what I see? Quality-of-experience measurements of mobile live video broadcasting. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 34 (April 2018), 23 pages.
[42]
Lea Skorin-Kapov, Martín Varela, Tobias Hossfeld, and Kuan-Ta Chen. 2018. A survey of emerging concepts and challenges for QoE management of multimedia services. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 29 (May 2018), 29 pages.
[43]
Ivan Slivar, Mirko Suznjevic, and Lea Skorin-Kapov. 2018. Game categorization for deriving QoE-driven video encoding configuration strategies for cloud gaming. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3s, Article 56 (June 2018), 24 pages.
[44]
J. You, J. Korhonen, and U. Reiter. 2011. Audiovisual quality fusion based on relative multimodal complexity. In Proceedings of the 2011 18th IEEE International Conference on Image Processing. 3337–3340.
[45]
Junyong You, Ulrich Reiter, Miska M. Hannuksela, Moncef Gabbouj, and Andrew Perkis. 2010. Perceptual-based quality assessment for audio-visual services: A survey. Signal Processing: Image Communication 25, 7 (2010), 482–501. Issue on Image and Video Quality Assessment.
[46]
Zhenhui Yuan, Shengyang Chen, Gheorghita Ghinea, and Gabriel-Miro Muntean. 2014. User quality of experience of multimedia applications. ACM Trans. Multimedia Comput. Commun. Appl. 11, 1s, Article 15 (Oct. 2014), 19 pages.
[47]
B. Zhang, Z. Yan, J. Wang, Y. Luo, S. Yang, and Z. Fei. 2018. An audio-visual quality assessment methodology in virtual reality environment. In Proceedings of the 2018 IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6.
[48]
Wei Zhang, Ting Yao, Shiai Zhu, and Abdulmotaleb El Saddik. 2019. Deep learning-based multimedia analytics: A review. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s, Article 2 (Jan. 2019), 26 pages.
[49]
Yi Zhu, Sharath Chandra Guntuku, Weisi Lin, Gheorghita Ghinea, and Judith A. Redi. 2018. Measuring individual video QoE: A survey, and proposal for future directions using social media. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 30 (May 2018), 24 pages.

Index Terms

  1. Assessment of Machine Learning-Based Audiovisual Quality Predictors: Why Uncertainty Matters

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2
        May 2021
        410 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3461621
        Issue’s Table of Contents
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 21 April 2021
        Accepted: 01 October 2020
        Revised: 01 August 2020
        Received: 01 January 2020
        Published in TOMM Volume 17, Issue 2

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Audiovisual quality
        2. machine learning
        3. uncertainty
        4. validation

        Qualifiers

        • Research-article
        • Research
        • Refereed

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 92
          Total Downloads
        • Downloads (Last 12 months)8
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 03 Mar 2025

        Other Metrics

        Citations

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media