research-article

Assessment of Machine Learning-Based Audiovisual Quality Predictors: Why Uncertainty Matters

Authors:

Manish NarwariaAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 2

Article No.: 45, Pages 1 - 22

https://doi.org/10.1145/3430376

Published: 21 April 2021 Publication History

Abstract

Quality assessment of audiovisual (AV) signals is important from the perspective of system design, optimization, and management of a modern multimedia communication system. However, automatic prediction of AV quality via the use of computational models remains challenging. In this context, machine learning (ML) appears to be an attractive alternative to the traditional approaches. This is especially when such assessment needs to be made in no-reference (i.e., the original signal is unavailable) fashion. While development of ML-based quality predictors is desirable, we argue that proper assessment and validation of such predictors is also crucial before they can be deployed in practice. To this end, we raise some fundamental questions about the current approach of ML-based model development for AV quality assessment and signal processing for multimedia communication in general. We also identify specific limitations associated with the current validation strategy which have implications on analysis and comparison of ML-based quality predictors. These include a lack of consideration of: (a) data uncertainty, (b) domain knowledge, (c) explicit learning ability of the trained model, and (d) interpretability of the resultant model. Therefore, the primary goal of this article is to shed some light into mentioned factors. Our analysis and proposed recommendations are of particular importance in the light of significant interests in ML methods for multimedia signal processing (specifically in cases where human-labeled data is used), and a lack of discussion of mentioned issues in existing literature.

References

[1]

Z. Akhtar and T. H. Falk. 2017. Audio-visual multimedia quality assessment: A comprehensive survey. IEEE Access 5 (2017), 21090–21117.

[2]

Benjamin Belmudez. 2015. Audiovisual Quality Assessment and Prediction for Videotelephony.

[3]

James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 1 (Feb. 2012), 281–305. http://dl.acm.org/citation.cfm?id=2503308.2188395

Digital Library

[4]

Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin,.

Digital Library

[5]

Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Cardoso. 2019. Machine learning interpretability: A survey on methods and metrics. Electronics 8, 8 (Jul 2019), 832.

[6]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). Association for Computing Machinery, New York, 785–794.

Digital Library

[7]

Tasos Dagiuklas, Raimund Schatz, Pedro Assuncao, and Luigi Atzori. 2017. Editorial: Special issue on “QoE monitoring and management for future internet media services”. Multimedia Tools and Applications 76, 21 (01 Nov 2017), 22213–22214.

[8]

Edip Demirbilek and Jean-Charles Grégoire. 2018. Perceived audiovisual quality modelling based on decison trees, genetic programming and neural networks. CoRR abs/1801.05889 (2018). arxiv:1801.05889http://arxiv.org/abs/1801.05889

[9]

Edip Demirbilek and Jean-Charles Grégoire. 2016. INRS audiovisual quality dataset. In Proceedings of the 24th ACM International Conference on Multimedia (MM’16). ACM, New York, NY, USA, 167–171.

Digital Library

[10]

Edip Demirbilek and Jean-Charles Grégoire. 2017. Machine learning-based parametric audiovisual quality prediction models for real-time communications. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2, Article 16 (March 2017), 25 pages.

Digital Library

[11]

E. Demirbilek and J. Grégoire. 2017. Machine learning based reduced reference bitstream audiovisual quality prediction models for realtime communications. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME). 571–576.

[12]

M. Garcia, P. List, S. Argyropoulos, D. Lindegren, M. Pettersson, B. Feiten, J. Gustafsson, and A. Raake. 2013. Parametric model for audiovisual quality assessment in IPTV: ITU-T Rec. P.1201.2. In Proceedings of the 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP). 482–487.

[13]

Marie-Neige Garcia, Robert Schleicher, and Alexander Raake. 2011. Impairment-factor-based audiovisual quality model for IPTV: Influence of video resolution, degradation type, and content type. EURASIP J. Image and Video Processing 2011 (2011).

[14]

Marie-Neige Garcia. 2016. Parametric Packet-based Audiovisual Quality Model for IPTV Services (1st ed.). Springer Publishing Company, Inc.

[15]

Marie-Neige Garcia, Peter Listy, Bernhard Feiteny, Ulf Wustenhageny, and Alexander Raake. 2016. Audio-video databases for H.264-bitstream-based quality assessment of IPTV services. In Proceedings of the 2016 IEEE International Conference Quality of Multimedia Experience. qomex2016.itec.aau.at/index.php/short-papers/

[16]

M. N. Garcia, A. Raake, and B. Feiten. 2013. Parametric audio quality model for IPTV services - ITU-T P.1201.2 audio. In Proceedings of the 2013 5th International Workshop on Quality of Multimedia Experience (QoMEX). 194–199.

[17]

P. Gastaldo, S. Rovetta, and R. Zunino. 2002. Objective quality assessment of MPEG-2 video streams by using CBP neural networks. IEEE Transactions on Neural Networks 13, 4 (July 2002), 939–947.

Digital Library

[18]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. The MIT Press.

Digital Library

[19]

Abdelwahab Hamam, Abdulmotaleb El Saddik, and Jihad Alja’am. 2014. A quality of experience model for haptic virtual environments. ACM Trans. Multimedia Comput. Commun. Appl. 10, 3, Article 28 (April 2014), 23 pages.

Digital Library

[20]

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction (2nd ed.). Springer.

[21]

Shin ichiro Iwamiya. 1994.Interactions between auditory and visual processing when listening to music in an audiovisual context: 1. Matching 2. Audio quality.

[22]

ITU-T Recommendation G.1070. 2018. Opinion Model for Video-telephony Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.

[23]

ITU-T Recommendation G.1071. 2016. Opinion Model for Network Planning of Video and Audio Streaming Applications. Technical Report. International Telecommunication Union, Geneva, Switzerland.

[24]

ITU-T Recommendation P.1201. 2012. Parametric Non-intrusive Assessment of Audiovisual Media Streaming Quality. Technical Report. International Telecommunication Union, Geneva, Switzerland.

[25]

Baris Konuk, Emin Zerman, Gokce Nur Yilmaz, and Gozde Akar. 2016. Video content analysis method for audiovisual quality assessment. 1–6.

[26]

Helard A. Becerra Martinez and Mylène C. Q. Farias. 2018. Combining audio and video metrics to assess audio-visual quality. Multimedia Tools and Applications 77, 18 (01 Sep 2018), 23993–24012.

[27]

Helard Becerra Martinez, Mylène C. Q. Farias, and Andrew Hines. 2019. NAViDAd: A no-reference audio-visual quality metric based on a deep autoencoder. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO 2019), (A Coruña, Spain, September 2-6, 2019). IEEE, 1–5.

[28]

Helard Becerra Martinez, Andrew Hines, and Mylène C. Q. Farias. 2020. How deep is your encoder: An analysis of features descriptors for an autoencoder-based audio-visual quality metric. In Proceedings of the 12th International Conference on Quality of Multimedia Experience (QoMEX 2020) (Athlone, Ireland, May 26-28, 2020). IEEE, 1–6.

[29]

Mansfield Merriman. 1877. On the history of the method of least squares. The Analyst 4, 2 (1877), 33–36. http://www.jstor.org/stable 2635472

[30]

Decebal Mocanu, Jeevan Pokhrel, Juan Pablo Garella, Janne Sepp nen, Eirini Liotou, and Manish Narwaria. 2015. No-reference video quality measurement: Added value of machine learning. Journal of Electronic Imaging 24 (12 2015), 061208.

[31]

S. Möller, B. Belmudez, M. Garcia, C. Kühnel, A. Raake, and B. Weiss. 2010. Audiovisual quality integration: Comparison of human-human and human-machine interaction scenarios of different interactivity. In Proceedings of the 2010 2nd International Workshop on Quality of Multimedia Experience (QoMEX). 58–63.

[32]

M. Narwaria. 2018. Toward better statistical validation of machine learning-based multimedia quality estimators. IEEE Transactions on Broadcasting 64, 2 (June 2018), 446–460.

[33]

M. Narwaria and W. Lin. 2010. Objective image quality assessment based on support vector regression. IEEE Transactions on Neural Networks 21, 3 (March 2010), 515–519.

Digital Library

[34]

J. Nightingale, P. Salva-Garcia, J. M. A. Calero, and Q. Wang. 2018. 5G-QoE: QoE modelling for ultra-HD video streaming in 5G networks. IEEE Transactions on Broadcasting 64, 2 (June 2018), 621–634.

[35]

Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, and Tetsuya Ogata. 2015. Audio-visual speech recognition using deep learning. Applied Intelligence 42, 4 (01 Jun 2015), 722–737.

[36]

K. Pearson. 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society of London Series A 187 (1896), 253–318.

[37]

Stefano Petrangeli, Jeroen Van Der Hooft, Tim Wauters, and Filip De Turck. 2018. Quality of experience-centric management of adaptive video streaming services: Status and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 31 (May 2018), 29 pages.

Digital Library

[38]

M. H. Pinson, W. Ingram, and A. Webster. 2011. Audiovisual quality components. IEEE Signal Processing Magazine 28, 6 (Nov. 2011), 60–67.

[39]

W. Robitza, M. N. Garcia, and A. Raake. 2015. At home in the lab: Assessing audiovisual quality of HTTP-based adaptive streaming with an immersive test paradigm. In Proceedings of the 2015 7th International Workshop on Quality of Multimedia Experience (QoMEX). 1–6.

[40]

George G. Roussas. 2003. An Introduction to Probability and Statistical Inference. Elsevier.

[41]

Matti Siekkinen, Teemu Kämäräinen, Leonardo Favario, and Enrico Masala. 2018. Can you see what I see? Quality-of-experience measurements of mobile live video broadcasting. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 34 (April 2018), 23 pages.

Digital Library

[42]

Lea Skorin-Kapov, Martín Varela, Tobias Hossfeld, and Kuan-Ta Chen. 2018. A survey of emerging concepts and challenges for QoE management of multimedia services. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 29 (May 2018), 29 pages.

Digital Library

[43]

Ivan Slivar, Mirko Suznjevic, and Lea Skorin-Kapov. 2018. Game categorization for deriving QoE-driven video encoding configuration strategies for cloud gaming. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3s, Article 56 (June 2018), 24 pages.

Digital Library

[44]

J. You, J. Korhonen, and U. Reiter. 2011. Audiovisual quality fusion based on relative multimodal complexity. In Proceedings of the 2011 18th IEEE International Conference on Image Processing. 3337–3340.

[45]

Junyong You, Ulrich Reiter, Miska M. Hannuksela, Moncef Gabbouj, and Andrew Perkis. 2010. Perceptual-based quality assessment for audio-visual services: A survey. Signal Processing: Image Communication 25, 7 (2010), 482–501. Issue on Image and Video Quality Assessment.

Digital Library

[46]

Zhenhui Yuan, Shengyang Chen, Gheorghita Ghinea, and Gabriel-Miro Muntean. 2014. User quality of experience of multimedia applications. ACM Trans. Multimedia Comput. Commun. Appl. 11, 1s, Article 15 (Oct. 2014), 19 pages.

Digital Library

[47]

B. Zhang, Z. Yan, J. Wang, Y. Luo, S. Yang, and Z. Fei. 2018. An audio-visual quality assessment methodology in virtual reality environment. In Proceedings of the 2018 IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6.

[48]

Wei Zhang, Ting Yao, Shiai Zhu, and Abdulmotaleb El Saddik. 2019. Deep learning-based multimedia analytics: A review. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s, Article 2 (Jan. 2019), 26 pages.

Digital Library

[49]

Yi Zhu, Sharath Chandra Guntuku, Weisi Lin, Gheorghita Ghinea, and Judith A. Redi. 2018. Measuring individual video QoE: A survey, and proposal for future directions using social media. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s, Article 30 (May 2018), 24 pages.

Digital Library

Index Terms

Assessment of Machine Learning-Based Audiovisual Quality Predictors: Why Uncertainty Matters
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Cross-validation
    2. Machine learning algorithms

Recommendations

Construction of a quality model for machine learning systems
Abstract
Nowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary qualities of the system and its ...
Machine Learning: The State of the Art

The two fundamental problems in machine learning (ML) are statistical analysis and algorithm design. The former tells us the principles of the mathematical models that we establish from the observation data. The latter defines the conditions on which ...
2nd International Workshop on Data Quality Assessment for Machine Learning
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

The 2nd International Workshop on Data Quality Assessment for Machine Learning (DQAML'21) is organized in conjunction with the Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). This workshop aims to serve as a forum for the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 2

May 2021

410 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3461621

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021

Accepted: 01 October 2020

Revised: 01 August 2020

Received: 01 January 2020

Published in TOMM Volume 17, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
92
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents