research-article

Live Monitoring of Speech Quality of Public Addressing Network Speakers: A Preliminary Study

Authors:
Elhard James Kumalija

University of Hyogo, Japan

University of Hyogo, Japan
View Profile

,
Yukikazu Nakamoto

University of Hyogo, Japan

University of Hyogo, Japan
View Profile

AICCC '20: Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing ConferenceDecember 2020Pages 97–101https://doi.org/10.1145/3442536.3442551

Published:15 March 2021Publication History

AICCC '20: Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference

Pages 97–101

ABSTRACT

There are a growing number of installations of network speakers in public space like train stations, schools, and hospitals. These speakers are used for announcements and playing background music. Network performance can affect the quality of announcement speech heard from the network speaker. In this study, a deep neural network method is proposed for live monitoring of the quality of speaker output as perceived by public space occupants. Single end method for speech quality assessment was proposed because of the nature of the application, there is no reference speech to use for assessment. The network node (end point) of the network speaker usually has low memory and computing resource. Therefore, compact deep neural network architecture and post-training quantization method were examined as deep neural network compression techniques for memory saving and compute acceleration. Using PESQ which is an end-to-end assessment method as the baseline for comparing the proposed method and ITU-T P.563 which are single-end methods, the estimated mean opinion score Pearson correlation coefficient was 0.710 and 0.40 for proposed method and P.563 respectively. The mean squared error for proposed method and P.563 was 0.154 and 0.319, respectively. The proposed method performed better than P.563 ITU-T recommended method.

References

Andrew A. Catellier and Stephen D. Voran. 2020. Wawenets: A No-Reference Convolutional Waveform-Based Approach to Estimating Narrowband and Wideband Speech Quality. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 331–335. DOI:https://doi.org/10.1109/ICASSP40776.2020.9054204Google Scholar
Benjamin Cauchi, Kai Siedenburg, Joao F. Santos, Tiago H. Falk, Simon Doclo, and Stefan Goetze. 2019. Non-Intrusive Speech Quality Prediction Using Modulation Energies and LSTM-Network. IEEE/ACM Trans. Audio Speech Lang. Process.27, 7 (July 2019), 1151–1163. DOI:https://doi.org/10.1109/TASLP.2019.2912123Google ScholarDigital Library
Hannes Gamper, Chandan K A Reddy, Ross Cutler, Ivan J Tashev, and Johannes Gehrke. 2019. Intrusive and Non-Intrusive Perceptual Speech Quality Assessment Using a Convolutional Neural Network.2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE, New Paltz, NY. DOI:https://doi.org/10.1109/WASPAA.2019.8937202Google Scholar
Philipp Gysel, Jon Pimentel, Mohammad Motamedi, and Soheil Ghiasi. 2018. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Trans. Neural Networks Learn. Syst.29, 11 (November 2018), 5784–5789. DOI:https://doi.org/10.1109/TNNLS.2018.2808319Google ScholarCross Ref
Rainer Huber and Birger Kollmeier. 2006. PEMO-Q-A new method for objective audio quality assessment using a model of auditory perception. IEEE Trans. Audio, Speech Lang. Process.14, 6 (November 2006), 1902–1911. DOI:https://doi.org/10.1109/TASL.2006.883259Google ScholarDigital Library
INTERNATIONAL TELECOMMUNICATION UNION. 1996. Methods for subjective determination of transmission quality. ITU-T Recomm. P.800 (1996).Google Scholar
INTERNATIONAL TELECOMMUNICATION UNION. 2001. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. ITU-T Recomm. P.862 (2001).Google Scholar
INTERNATIONAL TELECOMMUNICATION UNION. 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T Recomm. P.835 (2003).Google Scholar
INTERNATIONAL TELECOMMUNICATION UNION. 2004. Single-ended method for objective speech quality assessment in narrow-band telephony applications. ITU-T Recomm. P.563 (2004).Google Scholar
INTERNATIONAL TELECOMMUNICATION UNION. 2011. Perceptual Objective Listening Quality Assessment: An advanced objective perceptual method for end-to-end listening speech quality evaluation of fixed, mobile, and IP-based networks and speech codecs covering narrowband, wideband, and super-wideband. ITU-T Recomm. P.863 (2011).Google Scholar
Rafidul Islam, Ashequr Rahman, Numan Hasan, A. N.M.Shahriyar Hossain, Ahmed Nazim Uddin, and Mohammad Ariful Haque. 2017. Non-intrusive objective evaluation of speech quality in noisy condition. In Proceedings of 9th International Conference on Electrical and Computer Engineering, ICECE 2016, IEEE, 586–589. DOI:https://doi.org/10.1109/ICECE.2016.7853988Google Scholar
Catherine Colomes. Thiede Thilo. William C. Treurniet Roland Bitto Christian Schmidmer Thomas Sporer John G. Beerends. 2000. PEAQ-The ITU standard for objective measurement of perceived audio quality. J. Audio Eng. Soc.48, 1/2 (2000), 3–29.Google Scholar
Brian McFee, Colin Raffel, Dawen Liang, Daniel Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference. DOI:https://doi.org/10.25080/majora-7b98e3ed-003Google ScholarCross Ref
Dushyant Sharma, Yu Wang, Patrick A. Naylor, and Mike Brookes. 2016. A data-driven non-intrusive measure of speech quality and intelligibility. Speech Commun.80, (June 2016), 84–94. DOI:https://doi.org/10.1016/j.specom.2016.03.005Google ScholarDigital Library
Ana Paula Couto da Silva, Martín Varela, Edmundo de Souza e Silva, Rosa M.M. Leão, and Gerardo Rubino. 2008. Quality assessment of interactive voice applications. Comput. Networks 52, 6 (April 2008), 1179–1192. DOI:https://doi.org/10.1016/j.comnet.2008.01.002Google ScholarDigital Library
Cassia Valentini-Botinhao. 2016. Reverberant speech database for training speech dereverberation algorithms and TTS models, 2016 [dataset]. DOI:https://doi.org/https://doi.org/10.7488/ds/1425Google Scholar
Cassia Valentini-Botinhao. 2017. Noisy speech database for training speech enhancement algorithms and TTS models. (2017). DOI:https://doi.org/https://doi.org/10.7488/ds/2117Google Scholar
Raspberry Pi 4 Model B specifications – Raspberry Pi. Retrieved November 10, 2020 from https://www.raspberrypi.org/products/raspberry-pi-4-model-b/specifications/?resellerType=homeGoogle Scholar
tensorflow/tensorflow/lite at master · tensorflow/tensorflow · GitHub. Retrieved October 1, 2020 from https://github.com/tensorflow/tensorflow/tree/master/tensorflow/liteGoogle Scholar

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Read More
Speaker independent speech recognition method using training speech from a small number of speakers
ICASSP'92: Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

This paper presents a new speaker independent speech recognition method. which registers speech uttered by a small number of speakers into a dictionary as a "model" speech. It is based on the hypothesis that movement of the vocal tract differs little ...
Read More
Accent neutralization for speech recognition of non-native speakers
iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

AICCC '20: Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference
December 2020
114 pages
ISBN:9781450388832
DOI:10.1145/3442536

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 March 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DNN network optimization
Embedded DNN
Single-end speech quality evaluation
network speakers
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 33
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Live Monitoring of Speech Quality of Public Addressing Network Speakers: A Preliminary Study

AICCC '20: Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference

ABSTRACT

References

Cited By

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Speaker independent speech recognition method using training speech from a small number of speakers

Accent neutralization for speech recognition of non-native speakers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Live Monitoring of Speech Quality of Public Addressing Network Speakers: A Preliminary Study

AICCC '20: Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference

ABSTRACT

References

Cited By

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Speaker independent speech recognition method using training speech from a small number of speakers

Accent neutralization for speech recognition of non-native speakers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media