QoS Estimation and Prediction of Input Modality in Degraded IP Networks

Lovrenčič, Tomaž; Štular, Mitja; Kačič, Zdravko; Žgank, Andrej

doi:10.1007/s11277-014-2044-0

QoS Estimation and Prediction of Input Modality in Degraded IP Networks

Published: 09 September 2014

Volume 80, pages 837–857, (2015)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Tomaž Lovrenčič¹,
Mitja Štular¹,
Zdravko Kačič² &
…
Andrej Žgank²

198 Accesses
3 Citations
Explore all metrics

Abstract

This paper evaluates the impact of combined transcoding and packet loss degradation on speech as input for the interactive voice response service (IVR) and proposes a method for classification of user input according to speech quality. Careful optimization of a communication system and all of its segments need to be considered, as the quality of the user’s experience is becoming a more prominent part of the overall acceptance and desirability of modern service. Within our research, emulation environment was developed and the behavior of IVR analyzed under different packet loss and transcoding conditions. A set of frequently-used vocoders was tested on its performance with an automatic speech recognition module under degraded conditions. Further, quality estimation classifier was proposed, based on the Gaussian mixture models to determine best user’s input modality. Various train and test parameters were investigated to provide more detailed insight of input quality estimation for IVR service working under error prone conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Utilizing the Neural Networks for Speech Quality Estimation Based on the Network Characteristics

Automatic Speech Recognition Analysis Over Wireless Networks

Intelligent Assessment Method of Communication Interference Speech Quality Based on End-to-end Network

Article 11 January 2025

References

Besacier, L., Bergamini, C., Vaufreydaz, D., & Castelli, E. (2001). The effect of speech and audio compression on speech recognition performance. In Proceedings of the IEEE fourth workshop on multimedia signal processing (pp. 301–306).
Mayorga, P., Besacier, L., Lamy, R., & Serignat, J.-F. (2003). Audio packet loss over IP and speech recognition. In IEEE workshop in automatic speech recognition and understanding (ASRU 2003) (pp. 607–612).
Nocito, C. D., & Scordilis, M. S. (2011). Monitoring jitter and packet loss in VoIP networks using speech quality features. In Proceedings of the IEEE consumer communications and networking conference (CCNC 2011) (pp. 685–686).
Lovrenčič, T., Štular, M., & Žgank, A. (2010). Influence of transcoding on quality of IVR service. In 19th international electrotechnical and computer science conference (ERK 2010) (Vol. 19, pp. 265–268).
Ding, L., & Goubran, R. A. (2003). Assessment of effects of packet loss on speech quality in VoIP. In Proceedings of the IEEE international workshop on haptic, audio and visual environments and their applications (HAVE 2003) (pp. 49–54).
Tymchenko, O., & Zayarnyuk, M. (2008). Modeling of packets loss in VoIP networks and measurement of speech quality. In Proceedings of the international conference on modern problems of radio engineering, telecommunications and computer science (Vol. 1, pp. 387).
Roychoudhuri, L., Al-Shaer, E., & Brewster, G. B. (2006). On the impact of loss and delay variation on internet packet audio transmission. Computer Communications, 29(10), 1578–1589.
Article Google Scholar
Kim, H. K. (2008). Speech recognition over IP networks. In Z.-H. Tan & B. Lindberg (Eds.), Automatic speech recognition on mobile devices and over communication networks (Chap. 4, pp. 63–84). London: Springer.
Ramana, A. V., Parayitam, L., & Pala, M. S. (2012). Investigation of automatic speech recognition performance and mean opinion scores for different standard speech and audio codecs. IETE Journal of Research, 58(2), 121–129.
Article Google Scholar
Pratsolis, D., Tsourakis, N., & Digalakis, V. (2007). Degradation of speech recognition performance over lossy data networks. In Proceedings of the 3rd ACM workshop on wireless multimedia networking and performance modeling (WMuNeP 2007) (pp. 88–91).
Besacier, L. (2008). Speech coding and packet loss effects on speech and speaker recognition. In Z.-H. Tan & B. Lindberg (Eds.), Automatic speech recognition on mobile devices and over communication networks (Chap. 2, pp. 27–39). London: Springer.
Atayero, A. A., Ayo, C. K., Nicholas, I.-O., & Ambrose, A. (2009). Implementation of ‘ASR4CRM’: An automated speech-enabled customer care service system. In IEEE EUROCON 2009 (pp. 1712–1715).
Delogu, C., Di Carlo, A., Rotundi, P., & Sartori, D. (1998). A comparison between DTMF and ASR IVR services through objective and subjective evaluation. In Proceedings of the IEEE 4th workshop on interactive voice technology for telecommunications applications (IVTTA 1998) (pp. 145–150).
Halimah, B. Z., Azlina, A., Behrang, P., & Choo, W. O. (2008). Voice recognition system for the visually impaired: Virtual cognitive approach. In International symposium on information technology (ITSim 2008) (Vol. 2, pp. 1–6).
Ndwe, T. J., Dlodlo, M., & Nichols, J. (2010). Comparison of touch and speech-enabled IVR systems in low literacy users. In International conference on user science and engineering (i-USEr 2010) (Vol. 1, 244–249).
Gonia, K., & SANS Institute. (2004). Latency and QoS for voice over IP (white paper). Retrived from http://www.sans.org/reading-room/whitepapers/voip/latency-qos-voice-ip-1349.
ITU-T. (2003). G.114, one-way transmission time. Retrived from http://www.itu.int/rec/T-REC-G.114.
Pitas, C. N., Panagopoulos, A. D., & Constantinou, P. (2013). Speech and video telephony quality characterization and prediction of live contemporary mobile communication networks. Wireless Personal Communications, 69(1), 153–174.
Article Google Scholar
Rodrigues, D., Cerqueira, E., & Menteiro, E. (2009). QoE assessment of VoIP in next generation networks. In Proceedings of the 12th IFIP/IEEE international conference on management of multimedia and mobile networks and services (MMNS 2009) (Vol. 5842, pp. 94–105).
Agboma, F., & Liotta, A. (2008). QoE-aware QoS management. In Proceedings of the 6th international conference on advances in mobile computing and multimedia (MoMM) (pp. 111–116).
ITU-T. (2009). G.107, the E-model: A computational model for use in transmission planning. Retrived from https://www.itu.int/rec/T-REC-G.107.
ITU-T. (2001). P.862, perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Retrived from http://www.itu.int/rec/T-REC-P.862.
Rix, A. W., Beerends, J. G., Kim, D.-S., Kroon, P., & Ghitza, O. (2006). Objective assessment of speech and audio quality—Technology and applications. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1890–1901.
Article Google Scholar
Mermelstein, P. (1976). Distance measures for speech recognition, psychological and instrumental. In C. H. Chen (Ed.), Pattern recognition and artificial intelligence (pp. 374–388). Oxford: Elsevier.
Olive, J. P. (1992). Mixed spectral representation—Formants and linear predictive coding (LPC). Journal of the Acoustical Society of America, 92, 1837–1840.
Article Google Scholar
Hermansky, H. (1989). Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87, 639–643.
Google Scholar
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.
Article Google Scholar
Levy, C., Linares, G., & Bonastre, J. F. (2006). GMM-based acoustic modeling for embedded speech pecognition. In Proceedings of the international conference on spoken language processing (ICSLP 06) (pp. 1726–1729).
Austin, S., Barry, C., & Chow, Y.-L. (1989). Improved HMM models for high-performance speech recognition. In Proceedings of the workshop on speech and natural language (pp. 249–255).
Hattori, H. (1992). Text independent speaker recognition using neural networks. In IEEE international conference on acoustics, speech, and signal processing (ICASSP-92) (Vol. 2, pp. 153–156).
Ganapathiraju, A. (2002). Support vector mashines for speech recognition. Ph.D. thesis, Faculty of Mississippi State University, Department of Electrical and Computer Engineering.
Rodriguez, E., Ruiz, B., & Garcia-Crespo, A. (1997). Speech/speaker recognition using a HMM/GMM hybrid model. Audio- and Video-Based Biometric Person Authentication, 1206, 227–234.
Article Google Scholar
Poonam, B., Kant, A., Sharda, A., Kumar, S., & Gupta, S. (2008). Improved hybrid model of HMM/GMM for speech recognition (pp. 69–74). International book series “Information science and computing”. Institute of Information Theories and Applications (ITHEA).
Kinnunen, T., Karpov, E., & Fränti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 277–288.
Article Google Scholar
Falk, T. H., Xu, O., & Chan, W.-Y. (2005). Non-intrusive GMM-based speech quality measurement. In IEEE international conference on acoustics, speech, and signal processing (ICASSP 2005) (Vol. 1, pp. 125–128).
Jiang, H., Chen, S., & Yang, Y. (2010). Estimation of packet loss rate at wireless link of VANET–RPLE. In 6th international conference on wireless communications networking and mobile computing (WiCOM 2010) (pp. 1–5).
Kos, M., Grašič, M., & Kačič, Z. (2009). Online speech/music segmentation based on the variance mean of filter bank energy. In EURASIP Journal on Advances in Signal Processing 2009 (Vol. 2009, pp. 1–13).

Download references

Acknowledgments

This research was partly supported by the European Social Fund as part of the EU Operational Programme for Human Resources Development for the period 2007–2013 and partly supported by the Slovene Research Agency (ARRS) under Contract Number P2-0069. We gratefully acknowledge their support.

Author information

Authors and Affiliations

Telekom Slovenije d.d., Ljubljana, Slovenia
Tomaž Lovrenčič & Mitja Štular
Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia
Zdravko Kačič & Andrej Žgank

Authors

Tomaž Lovrenčič
View author publications
You can also search for this author in PubMed Google Scholar
Mitja Štular
View author publications
You can also search for this author in PubMed Google Scholar
Zdravko Kačič
View author publications
You can also search for this author in PubMed Google Scholar
Andrej Žgank
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomaž Lovrenčič.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lovrenčič, T., Štular, M., Kačič, Z. et al. QoS Estimation and Prediction of Input Modality in Degraded IP Networks. Wireless Pers Commun 80, 837–857 (2015). https://doi.org/10.1007/s11277-014-2044-0

Download citation

Published: 09 September 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11277-014-2044-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

QoS Estimation and Prediction of Input Modality in Degraded IP Networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Utilizing the Neural Networks for Speech Quality Estimation Based on the Network Characteristics

Automatic Speech Recognition Analysis Over Wireless Networks

Intelligent Assessment Method of Communication Interference Speech Quality Based on End-to-end Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now