skip to main content
10.1145/3442536.3442551acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaicccConference Proceedingsconference-collections
research-article

Live Monitoring of Speech Quality of Public Addressing Network Speakers: A Preliminary Study

Published:15 March 2021Publication History

ABSTRACT

There are a growing number of installations of network speakers in public space like train stations, schools, and hospitals. These speakers are used for announcements and playing background music. Network performance can affect the quality of announcement speech heard from the network speaker. In this study, a deep neural network method is proposed for live monitoring of the quality of speaker output as perceived by public space occupants. Single end method for speech quality assessment was proposed because of the nature of the application, there is no reference speech to use for assessment. The network node (end point) of the network speaker usually has low memory and computing resource. Therefore, compact deep neural network architecture and post-training quantization method were examined as deep neural network compression techniques for memory saving and compute acceleration. Using PESQ which is an end-to-end assessment method as the baseline for comparing the proposed method and ITU-T P.563 which are single-end methods, the estimated mean opinion score Pearson correlation coefficient was 0.710 and 0.40 for proposed method and P.563 respectively. The mean squared error for proposed method and P.563 was 0.154 and 0.319, respectively. The proposed method performed better than P.563 ITU-T recommended method.

References

  1. Andrew A. Catellier and Stephen D. Voran. 2020. Wawenets: A No-Reference Convolutional Waveform-Based Approach to Estimating Narrowband and Wideband Speech Quality. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 331–335. DOI:https://doi.org/10.1109/ICASSP40776.2020.9054204Google ScholarGoogle Scholar
  2. Benjamin Cauchi, Kai Siedenburg, Joao F. Santos, Tiago H. Falk, Simon Doclo, and Stefan Goetze. 2019. Non-Intrusive Speech Quality Prediction Using Modulation Energies and LSTM-Network. IEEE/ACM Trans. Audio Speech Lang. Process.27, 7 (July 2019), 1151–1163. DOI:https://doi.org/10.1109/TASLP.2019.2912123Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hannes Gamper, Chandan K A Reddy, Ross Cutler, Ivan J Tashev, and Johannes Gehrke. 2019. Intrusive and Non-Intrusive Perceptual Speech Quality Assessment Using a Convolutional Neural Network.2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE, New Paltz, NY. DOI:https://doi.org/10.1109/WASPAA.2019.8937202Google ScholarGoogle Scholar
  4. Philipp Gysel, Jon Pimentel, Mohammad Motamedi, and Soheil Ghiasi. 2018. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Trans. Neural Networks Learn. Syst.29, 11 (November 2018), 5784–5789. DOI:https://doi.org/10.1109/TNNLS.2018.2808319Google ScholarGoogle ScholarCross RefCross Ref
  5. Rainer Huber and Birger Kollmeier. 2006. PEMO-Q-A new method for objective audio quality assessment using a model of auditory perception. IEEE Trans. Audio, Speech Lang. Process.14, 6 (November 2006), 1902–1911. DOI:https://doi.org/10.1109/TASL.2006.883259Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. INTERNATIONAL TELECOMMUNICATION UNION. 1996. Methods for subjective determination of transmission quality. ITU-T Recomm. P.800 (1996).Google ScholarGoogle Scholar
  7. INTERNATIONAL TELECOMMUNICATION UNION. 2001. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. ITU-T Recomm. P.862 (2001).Google ScholarGoogle Scholar
  8. INTERNATIONAL TELECOMMUNICATION UNION. 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T Recomm. P.835 (2003).Google ScholarGoogle Scholar
  9. INTERNATIONAL TELECOMMUNICATION UNION. 2004. Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T Recomm. P.563 (2004).Google ScholarGoogle Scholar
  10. INTERNATIONAL TELECOMMUNICATION UNION. 2011. Perceptual Objective Listening Quality Assessment: An advanced objective perceptual method for end-to-end listening speech quality evaluation of fixed, mobile, and IP-based networks and speech codecs covering narrowband, wideband, and super-wideband. ITU-T Recomm. P.863 (2011).Google ScholarGoogle Scholar
  11. Rafidul Islam, Ashequr Rahman, Numan Hasan, A. N.M.Shahriyar Hossain, Ahmed Nazim Uddin, and Mohammad Ariful Haque. 2017. Non-intrusive objective evaluation of speech quality in noisy condition. In Proceedings of 9th International Conference on Electrical and Computer Engineering, ICECE 2016, IEEE, 586–589. DOI:https://doi.org/10.1109/ICECE.2016.7853988Google ScholarGoogle Scholar
  12. Catherine Colomes. Thiede Thilo. William C. Treurniet Roland Bitto Christian Schmidmer Thomas Sporer John G. Beerends. 2000. PEAQ-The ITU standard for objective measurement of perceived audio quality. J. Audio Eng. Soc.48, 1/2 (2000), 3–29.Google ScholarGoogle Scholar
  13. Brian McFee, Colin Raffel, Dawen Liang, Daniel Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference. DOI:https://doi.org/10.25080/majora-7b98e3ed-003Google ScholarGoogle ScholarCross RefCross Ref
  14. Dushyant Sharma, Yu Wang, Patrick A. Naylor, and Mike Brookes. 2016. A data-driven non-intrusive measure of speech quality and intelligibility. Speech Commun.80, (June 2016), 84–94. DOI:https://doi.org/10.1016/j.specom.2016.03.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ana Paula Couto da Silva, Martín Varela, Edmundo de Souza e Silva, Rosa M.M. Leão, and Gerardo Rubino. 2008. Quality assessment of interactive voice applications. Comput. Networks 52, 6 (April 2008), 1179–1192. DOI:https://doi.org/10.1016/j.comnet.2008.01.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cassia Valentini-Botinhao. 2016. Reverberant speech database for training speech dereverberation algorithms and TTS models, 2016 [dataset]. DOI:https://doi.org/https://doi.org/10.7488/ds/1425Google ScholarGoogle Scholar
  17. Cassia Valentini-Botinhao. 2017. Noisy speech database for training speech enhancement algorithms and TTS models. (2017). DOI:https://doi.org/https://doi.org/10.7488/ds/2117Google ScholarGoogle Scholar
  18. Raspberry Pi 4 Model B specifications – Raspberry Pi. Retrieved November 10, 2020 from https://www.raspberrypi.org/products/raspberry-pi-4-model-b/specifications/?resellerType=homeGoogle ScholarGoogle Scholar
  19. tensorflow/tensorflow/lite at master · tensorflow/tensorflow · GitHub. Retrieved October 1, 2020 from https://github.com/tensorflow/tensorflow/tree/master/tensorflow/liteGoogle ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    AICCC '20: Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference
    December 2020
    114 pages
    ISBN:9781450388832
    DOI:10.1145/3442536

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 March 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format