Skip to main content
Log in

Visualization of Babble–Speech Interactions Using Andrews Curves

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Visualizing multidimensional data such as the mel frequency cepstral coefficients (MFCCs) proves difficult, especially when the number of dimensions is greater than 3. As a result, it becomes extremely difficult to spot trends in high-dimensional signal interactions. Andrews curves seam to aid in the process of performing graphical analysis of high-dimensional data. This study examines the properties of the babble in the feature domain as well as the effect of the babble noise on the MFCCs of clean speech. Experiments have been conducted using two babble models: the overlapping conversation model and the overlapping speaker model. The purpose of this paper was to provide an insight into the effect of the babble noise on the first thirteen MFCCs of clean speech through the use of Andrews curves. The investigations of this paper give a visual comparison of the signals to expose trends, which the conventional visualization methods do not. The use of Andrews curves not only allows the signal to be observed, but also allows for a statistical comparisons between signals. With a better understanding of the difference between the models, it would be possible to develop systems, which are more robust in babble-corrupted environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. D. Andrews, Plots of high dimensional data. Biometrics 28, 125–136 (1972)

    Article  Google Scholar 

  2. B. Arons, A review of the cocktail party effect. Tech. Rep., MIT Media Labs (2000)

  3. B.J. Borgstrom, A. Alwan, Utilizing compressibility in reconstructing spectrographic data, with applications to noise robust asr. IEEE Signal Process. Lett. 16(5), 398–401 (2009)

    Article  Google Scholar 

  4. B.J. Borgstrom, A. Alwan, A statistical approach to mel-domain mask estimation for missing-feature asr. IEEE Signal Process. Lett. 17(11), 941–944 (2010)

    Article  Google Scholar 

  5. S. Davis, P. Mermelstein, Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Proces. 28, 357–366 (1980)

    Article  Google Scholar 

  6. A. Dev, P. Bansal, Robust features for noisy speech recognition using mfcc computation from magnitude spectrum of higher order autocorrelation coefficients. Int. J. Comput. Appl. 10(8), 36–38 (2010)

    Google Scholar 

  7. J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, Timit acoustic–phonetic continuous speech corpus LDC93S1, Web Download (Linguistic Data Consortium, Philadelphia, 1993)

  8. J.J. Godfrey, E. Holliman, Switchboard credit card LDC93S8, Web Download (Linguistic Data Consortium, Philadelphia, 1993)

  9. P. Hix, S.A. Zahorian, M. Fansheng, Novel feature extraction for noise robust ASR using the aurora 2 database, in 2006 IEEE International Conference on, Acoustics, Speech and Signal Processing, ICASSP 2006, vol. 1 (2006), pp. 1–7

  10. W. Jian, J. Droppo, D. Li, A. Acero, A noise-robust asr front-end using wiener filter constructed from mmse estimation of clean speech and noise, in 2003 IEEE Workshop on, Automatic Speech Recognition and Understanding, 2003. ASRU ’03, pp. 321–326 (2003)

  11. G. Kim, P. Loizou, Improving speech intelligibility in noise using a binary mask that is based on continue spectrum constraints. IEEE Signal Proces. Lett. 17, 1010–1013 (2010)

    Article  Google Scholar 

  12. N. Krishnamurthy, J. Hansen, Babble noise: modeling, analysis and applications. IEEE Trans. Audio Speech Lang. Proces. 17(7), 1394–1407 (2009)

    Article  Google Scholar 

  13. H. Lane, B. Tranel, The lombard sign and the role of hearing in speech. J. Speech Lang. Hear. Res. 14, 677–709 (1971)

    Article  Google Scholar 

  14. H. Lane, B. Tranel, The lombard reflex and its role on human listeners and automatic speech recognizers. Acoust. Soc. Am. 93, 510–524 (1993)

    Article  Google Scholar 

  15. P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r. Bioinformatics 24(5), 543–565 (2008)

    Article  Google Scholar 

  16. H. Liang, N. Malik, Reducing cocktail party noise by adaptive array filtering, in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’87, (1987), pp. 185–188

  17. P. Loizou, Speech Enhancement Theory and Practice (CRC Press Taylor and Francis, Boca Raton, 2007)

    Google Scholar 

  18. M.D. Maeva Garnier, The lombard effect: a physiological reflex or a controlled intelligibility enhancement, in 7th International Seminar on Speech Production (2006), pp. 255–262

  19. O.M. Mitchell, C.A. Ross, G.H. Yates, Signal processing for a cocktail party effect. J. Acoust. Soc. Am. 50, 656–660 (1971)

    Article  Google Scholar 

  20. N. Mohammadiha, A. Leijon, Nonnegative hmm for babble noise derived from speech HMM: application to speech enhancement. IEEE Trans. Audio Speech Lang. Process. 21(5), 998–1011 (2013)

    Article  Google Scholar 

  21. N. Morgan, H. Hermansky, Rasta: extensions: robustness to additive and convolutional noise, in ESCS Workshop on Speech Processing in Adverse Conditions (1992)

  22. C. Neves, A. Veiga, L. Sa, F. Perdigao, Efficient noise-robust speech recognition front-end based on etsi standard. ICSP (2008), pp. 609–612

  23. C. Pal, B. Frey, T. Kristjansson, Noise robust speech recognition using gaussian basis functions for non-linear likelihood function approximation, in 2002 IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (2002), pp. I–405–I–408

  24. L. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice Hall International, Englewood Cliffs, 1993)

    MATH  Google Scholar 

  25. L.R. Rabiner, R.W. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011)

    Google Scholar 

  26. A. Ragni, M.J.F. Gales, Derivative kernels for noise robust ASR, in 2011 IEEE Workshop on, Automatic Speech Recognition and Understanding (ASRU) (2011), pp. 119–124

  27. D. Sarkar, Lattice Multivariate Data Visualization with R (Springer, Berlin, 2008)

    MATH  Google Scholar 

  28. M. Shahidullah, S. Goutam, Design, analysis and experimental evaluation of block based transformation in mfcc computation for speaker recognition. Speech Commun. 54(4), 719–720 (2008)

    Google Scholar 

  29. D.P. Sharma, J.M. Atkins, FPGA-based embedded solution for automatic speech recognition, in The Second Industrial Engineering and Management Conference on Fostering Engineering Networking, Collaboration and Competence. University of the West Indies (2010), pp. 146–152

  30. D. Sharma, J. Atkins, Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng, 7(4), 220–234 (2014)

    Article  Google Scholar 

  31. Z. Shi-Xiong, A. Ragni, M.J.F. Gales, Structured log linear models for noise robust speech recognition. IEEE Signal Proces. Lett. 17(11), 945–948 (2010)

    Article  Google Scholar 

  32. M.D. Skowronski, J.G. Harris, Noise-robust automatic speech recognition using a predictive echo state network. IEEE Trans. Audio Speech Lang. Proces. 15(5), 1724–1730 (2007)

    Article  MATH  Google Scholar 

  33. M. Slaney, Auditory Toolbox—Version 2 Technical Report #1998-010 (Interval Research Corporation, CA, 2010)

  34. E. Wegman, Hyperdimensional data analysis using parallel coordinates. J. Am. Stat. Assoc. 85, 664–675 (1990)

    Article  Google Scholar 

  35. L. Weifeng, W. Longbiao, Z. Yicong, H. Bourlard, L. Qingmin, Robust log-energy estimation and its dynamic change enhancement for in-car speech recognition. IEEE Trans. Audio Speech Lang. Proces. 21(8), 1689–1698 (2013)

    Article  Google Scholar 

  36. F. Weninger, M. Wollmer, J. Geiger, B. Schuller, J.F. Gemmeke, A. Hurmalainen, T. Virtanen, G. Rigoll, Non-negative matrix factorization for highly noise-robust asr: to enhance or to recognize? in 2012 IEEE International Conference on, Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 4681–4684

  37. P.C. Wong, R.D. Bergeron, Thirty Years of Multidimensional Multivariate Visualization (IEEE Computer Society Press, Washington DC, 1997)

Download references

Acknowledgments

Authors are thankful to The University of the West Indies for providing necessary funding through Grant No. CRP.4.MAR11.4 to carry out research on the project “Development of Algorithms and Systems for Robust speech Recognition in the Noisy Environments.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davinder Pal Sharma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Atkins, J., Sharma, D.P. Visualization of Babble–Speech Interactions Using Andrews Curves. Circuits Syst Signal Process 35, 1313–1331 (2016). https://doi.org/10.1007/s00034-015-0123-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-015-0123-4

Keywords

Navigation