Skip to main content
Log in

QoS Estimation and Prediction of Input Modality in Degraded IP Networks

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

This paper evaluates the impact of combined transcoding and packet loss degradation on speech as input for the interactive voice response service (IVR) and proposes a method for classification of user input according to speech quality. Careful optimization of a communication system and all of its segments need to be considered, as the quality of the user’s experience is becoming a more prominent part of the overall acceptance and desirability of modern service. Within our research, emulation environment was developed and the behavior of IVR analyzed under different packet loss and transcoding conditions. A set of frequently-used vocoders was tested on its performance with an automatic speech recognition module under degraded conditions. Further, quality estimation classifier was proposed, based on the Gaussian mixture models to determine best user’s input modality. Various train and test parameters were investigated to provide more detailed insight of input quality estimation for IVR service working under error prone conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Besacier, L., Bergamini, C., Vaufreydaz, D., & Castelli, E. (2001). The effect of speech and audio compression on speech recognition performance. In Proceedings of the IEEE fourth workshop on multimedia signal processing (pp. 301–306).

  2. Mayorga, P., Besacier, L., Lamy, R., & Serignat, J.-F. (2003). Audio packet loss over IP and speech recognition. In IEEE workshop in automatic speech recognition and understanding (ASRU 2003) (pp. 607–612).

  3. Nocito, C. D., & Scordilis, M. S. (2011). Monitoring jitter and packet loss in VoIP networks using speech quality features. In Proceedings of the IEEE consumer communications and networking conference (CCNC 2011) (pp. 685–686).

  4. Lovrenčič, T., Štular, M., & Žgank, A. (2010). Influence of transcoding on quality of IVR service. In 19th international electrotechnical and computer science conference (ERK 2010) (Vol. 19, pp. 265–268).

  5. Ding, L., & Goubran, R. A. (2003). Assessment of effects of packet loss on speech quality in VoIP. In Proceedings of the IEEE international workshop on haptic, audio and visual environments and their applications (HAVE 2003) (pp. 49–54).

  6. Tymchenko, O., & Zayarnyuk, M. (2008). Modeling of packets loss in VoIP networks and measurement of speech quality. In Proceedings of the international conference on modern problems of radio engineering, telecommunications and computer science (Vol. 1, pp. 387).

  7. Roychoudhuri, L., Al-Shaer, E., & Brewster, G. B. (2006). On the impact of loss and delay variation on internet packet audio transmission. Computer Communications, 29(10), 1578–1589.

    Article  Google Scholar 

  8. Kim, H. K. (2008). Speech recognition over IP networks. In Z.-H. Tan & B. Lindberg (Eds.), Automatic speech recognition on mobile devices and over communication networks (Chap. 4, pp. 63–84). London: Springer.

  9. Ramana, A. V., Parayitam, L., & Pala, M. S. (2012). Investigation of automatic speech recognition performance and mean opinion scores for different standard speech and audio codecs. IETE Journal of Research, 58(2), 121–129.

    Article  Google Scholar 

  10. Pratsolis, D., Tsourakis, N., & Digalakis, V. (2007). Degradation of speech recognition performance over lossy data networks. In Proceedings of the 3rd ACM workshop on wireless multimedia networking and performance modeling (WMuNeP 2007) (pp. 88–91).

  11. Besacier, L. (2008). Speech coding and packet loss effects on speech and speaker recognition. In Z.-H. Tan & B. Lindberg (Eds.), Automatic speech recognition on mobile devices and over communication networks (Chap. 2, pp. 27–39). London: Springer.

  12. Atayero, A. A., Ayo, C. K., Nicholas, I.-O., & Ambrose, A. (2009). Implementation of ‘ASR4CRM’: An automated speech-enabled customer care service system. In IEEE EUROCON 2009 (pp. 1712–1715).

  13. Delogu, C., Di Carlo, A., Rotundi, P., & Sartori, D. (1998). A comparison between DTMF and ASR IVR services through objective and subjective evaluation. In Proceedings of the IEEE 4th workshop on interactive voice technology for telecommunications applications (IVTTA 1998) (pp. 145–150).

  14. Halimah, B. Z., Azlina, A., Behrang, P., & Choo, W. O. (2008). Voice recognition system for the visually impaired: Virtual cognitive approach. In International symposium on information technology (ITSim 2008) (Vol. 2, pp. 1–6).

  15. Ndwe, T. J., Dlodlo, M., & Nichols, J. (2010). Comparison of touch and speech-enabled IVR systems in low literacy users. In International conference on user science and engineering (i-USEr 2010) (Vol. 1, 244–249).

  16. Gonia, K., & SANS Institute. (2004). Latency and QoS for voice over IP (white paper). Retrived from http://www.sans.org/reading-room/whitepapers/voip/latency-qos-voice-ip-1349.

  17. ITU-T. (2003). G.114, one-way transmission time. Retrived from http://www.itu.int/rec/T-REC-G.114.

  18. Pitas, C. N., Panagopoulos, A. D., & Constantinou, P. (2013). Speech and video telephony quality characterization and prediction of live contemporary mobile communication networks. Wireless Personal Communications, 69(1), 153–174.

    Article  Google Scholar 

  19. Rodrigues, D., Cerqueira, E., & Menteiro, E. (2009). QoE assessment of VoIP in next generation networks. In Proceedings of the 12th IFIP/IEEE international conference on management of multimedia and mobile networks and services (MMNS 2009) (Vol. 5842, pp. 94–105).

  20. Agboma, F., & Liotta, A. (2008). QoE-aware QoS management. In Proceedings of the 6th international conference on advances in mobile computing and multimedia (MoMM) (pp. 111–116).

  21. ITU-T. (2009). G.107, the E-model: A computational model for use in transmission planning. Retrived from https://www.itu.int/rec/T-REC-G.107.

  22. ITU-T. (2001). P.862, perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Retrived from http://www.itu.int/rec/T-REC-P.862.

  23. Rix, A. W., Beerends, J. G., Kim, D.-S., Kroon, P., & Ghitza, O. (2006). Objective assessment of speech and audio quality—Technology and applications. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1890–1901.

    Article  Google Scholar 

  24. Mermelstein, P. (1976). Distance measures for speech recognition, psychological and instrumental. In C. H. Chen (Ed.), Pattern recognition and artificial intelligence (pp. 374–388). Oxford: Elsevier.

  25. Olive, J. P. (1992). Mixed spectral representation—Formants and linear predictive coding (LPC). Journal of the Acoustical Society of America, 92, 1837–1840.

    Article  Google Scholar 

  26. Hermansky, H. (1989). Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America, 87, 639–643.

    Google Scholar 

  27. Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.

    Article  Google Scholar 

  28. Levy, C., Linares, G., & Bonastre, J. F. (2006). GMM-based acoustic modeling for embedded speech pecognition. In Proceedings of the international conference on spoken language processing (ICSLP 06) (pp. 1726–1729).

  29. Austin, S., Barry, C., & Chow, Y.-L. (1989). Improved HMM models for high-performance speech recognition. In Proceedings of the workshop on speech and natural language (pp. 249–255).

  30. Hattori, H. (1992). Text independent speaker recognition using neural networks. In IEEE international conference on acoustics, speech, and signal processing (ICASSP-92) (Vol. 2, pp. 153–156).

  31. Ganapathiraju, A. (2002). Support vector mashines for speech recognition. Ph.D. thesis, Faculty of Mississippi State University, Department of Electrical and Computer Engineering.

  32. Rodriguez, E., Ruiz, B., & Garcia-Crespo, A. (1997). Speech/speaker recognition using a HMM/GMM hybrid model. Audio- and Video-Based Biometric Person Authentication, 1206, 227–234.

    Article  Google Scholar 

  33. Poonam, B., Kant, A., Sharda, A., Kumar, S., & Gupta, S. (2008). Improved hybrid model of HMM/GMM for speech recognition (pp. 69–74). International book series “Information science and computing”. Institute of Information Theories and Applications (ITHEA).

  34. Kinnunen, T., Karpov, E., & Fränti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 277–288.

    Article  Google Scholar 

  35. Falk, T. H., Xu, O., & Chan, W.-Y. (2005). Non-intrusive GMM-based speech quality measurement. In IEEE international conference on acoustics, speech, and signal processing (ICASSP 2005) (Vol. 1, pp. 125–128).

  36. Jiang, H., Chen, S., & Yang, Y. (2010). Estimation of packet loss rate at wireless link of VANET–RPLE. In 6th international conference on wireless communications networking and mobile computing (WiCOM 2010) (pp. 1–5).

  37. Kos, M., Grašič, M., & Kačič, Z. (2009). Online speech/music segmentation based on the variance mean of filter bank energy. In EURASIP Journal on Advances in Signal Processing 2009 (Vol. 2009, pp. 1–13).

Download references

Acknowledgments

This research was partly supported by the European Social Fund as part of the EU Operational Programme for Human Resources Development for the period 2007–2013 and partly supported by the Slovene Research Agency (ARRS) under Contract Number P2-0069. We gratefully acknowledge their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomaž Lovrenčič.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lovrenčič, T., Štular, M., Kačič, Z. et al. QoS Estimation and Prediction of Input Modality in Degraded IP Networks. Wireless Pers Commun 80, 837–857 (2015). https://doi.org/10.1007/s11277-014-2044-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-014-2044-0

Keywords

Navigation