Visualization of Babble–Speech Interactions Using Andrews Curves

Atkins, Jamin; Sharma, Davinder Pal

doi:10.1007/s00034-015-0123-4

Visualization of Babble–Speech Interactions Using Andrews Curves

Published: 08 July 2015

Volume 35, pages 1313–1331, (2016)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Jamin Atkins¹ &
Davinder Pal Sharma¹

287 Accesses
7 Citations
Explore all metrics

Abstract

Visualizing multidimensional data such as the mel frequency cepstral coefficients (MFCCs) proves difficult, especially when the number of dimensions is greater than 3. As a result, it becomes extremely difficult to spot trends in high-dimensional signal interactions. Andrews curves seam to aid in the process of performing graphical analysis of high-dimensional data. This study examines the properties of the babble in the feature domain as well as the effect of the babble noise on the MFCCs of clean speech. Experiments have been conducted using two babble models: the overlapping conversation model and the overlapping speaker model. The purpose of this paper was to provide an insight into the effect of the babble noise on the first thirteen MFCCs of clean speech through the use of Andrews curves. The investigations of this paper give a visual comparison of the signals to expose trends, which the conventional visualization methods do not. The use of Andrews curves not only allows the signal to be observed, but also allows for a statistical comparisons between signals. With a better understanding of the difference between the models, it would be possible to develop systems, which are more robust in babble-corrupted environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A Preliminary Study of Acoustic Events Classification with Factor Analysis in Meeting Rooms

Binaural Scene Analysis with Multidimensional Statistical Filters

Star DGT: a robust Gabor transform for speech denoising

Article 12 April 2023

Vicky Kouni, Holger Rauhut & Theoharis Theoharis

References

D. Andrews, Plots of high dimensional data. Biometrics 28, 125–136 (1972)
Article Google Scholar
B. Arons, A review of the cocktail party effect. Tech. Rep., MIT Media Labs (2000)
B.J. Borgstrom, A. Alwan, Utilizing compressibility in reconstructing spectrographic data, with applications to noise robust asr. IEEE Signal Process. Lett. 16(5), 398–401 (2009)
Article Google Scholar
B.J. Borgstrom, A. Alwan, A statistical approach to mel-domain mask estimation for missing-feature asr. IEEE Signal Process. Lett. 17(11), 941–944 (2010)
Article Google Scholar
S. Davis, P. Mermelstein, Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Proces. 28, 357–366 (1980)
Article Google Scholar
A. Dev, P. Bansal, Robust features for noisy speech recognition using mfcc computation from magnitude spectrum of higher order autocorrelation coefficients. Int. J. Comput. Appl. 10(8), 36–38 (2010)
Google Scholar
J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, Timit acoustic–phonetic continuous speech corpus LDC93S1, Web Download (Linguistic Data Consortium, Philadelphia, 1993)
J.J. Godfrey, E. Holliman, Switchboard credit card LDC93S8, Web Download (Linguistic Data Consortium, Philadelphia, 1993)
P. Hix, S.A. Zahorian, M. Fansheng, Novel feature extraction for noise robust ASR using the aurora 2 database, in 2006 IEEE International Conference on, Acoustics, Speech and Signal Processing, ICASSP 2006, vol. 1 (2006), pp. 1–7
W. Jian, J. Droppo, D. Li, A. Acero, A noise-robust asr front-end using wiener filter constructed from mmse estimation of clean speech and noise, in 2003 IEEE Workshop on, Automatic Speech Recognition and Understanding, 2003. ASRU ’03, pp. 321–326 (2003)
G. Kim, P. Loizou, Improving speech intelligibility in noise using a binary mask that is based on continue spectrum constraints. IEEE Signal Proces. Lett. 17, 1010–1013 (2010)
Article Google Scholar
N. Krishnamurthy, J. Hansen, Babble noise: modeling, analysis and applications. IEEE Trans. Audio Speech Lang. Proces. 17(7), 1394–1407 (2009)
Article Google Scholar
H. Lane, B. Tranel, The lombard sign and the role of hearing in speech. J. Speech Lang. Hear. Res. 14, 677–709 (1971)
Article Google Scholar
H. Lane, B. Tranel, The lombard reflex and its role on human listeners and automatic speech recognizers. Acoust. Soc. Am. 93, 510–524 (1993)
Article Google Scholar
P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r. Bioinformatics 24(5), 543–565 (2008)
Article Google Scholar
H. Liang, N. Malik, Reducing cocktail party noise by adaptive array filtering, in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’87, (1987), pp. 185–188
P. Loizou, Speech Enhancement Theory and Practice (CRC Press Taylor and Francis, Boca Raton, 2007)
Google Scholar
M.D. Maeva Garnier, The lombard effect: a physiological reflex or a controlled intelligibility enhancement, in 7th International Seminar on Speech Production (2006), pp. 255–262
O.M. Mitchell, C.A. Ross, G.H. Yates, Signal processing for a cocktail party effect. J. Acoust. Soc. Am. 50, 656–660 (1971)
Article Google Scholar
N. Mohammadiha, A. Leijon, Nonnegative hmm for babble noise derived from speech HMM: application to speech enhancement. IEEE Trans. Audio Speech Lang. Process. 21(5), 998–1011 (2013)
Article Google Scholar
N. Morgan, H. Hermansky, Rasta: extensions: robustness to additive and convolutional noise, in ESCS Workshop on Speech Processing in Adverse Conditions (1992)
C. Neves, A. Veiga, L. Sa, F. Perdigao, Efficient noise-robust speech recognition front-end based on etsi standard. ICSP (2008), pp. 609–612
C. Pal, B. Frey, T. Kristjansson, Noise robust speech recognition using gaussian basis functions for non-linear likelihood function approximation, in 2002 IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (2002), pp. I–405–I–408
L. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice Hall International, Englewood Cliffs, 1993)
MATH Google Scholar
L.R. Rabiner, R.W. Schafer, Theory and Applications of Digital Speech Processing (Pearson Higher Education, Upper Saddle River, 2011)
Google Scholar
A. Ragni, M.J.F. Gales, Derivative kernels for noise robust ASR, in 2011 IEEE Workshop on, Automatic Speech Recognition and Understanding (ASRU) (2011), pp. 119–124
D. Sarkar, Lattice Multivariate Data Visualization with R (Springer, Berlin, 2008)
MATH Google Scholar
M. Shahidullah, S. Goutam, Design, analysis and experimental evaluation of block based transformation in mfcc computation for speaker recognition. Speech Commun. 54(4), 719–720 (2008)
Google Scholar
D.P. Sharma, J.M. Atkins, FPGA-based embedded solution for automatic speech recognition, in The Second Industrial Engineering and Management Conference on Fostering Engineering Networking, Collaboration and Competence. University of the West Indies (2010), pp. 146–152
D. Sharma, J. Atkins, Automatic speech recognition systems: challenges and recent implementation trends. Int. J. Signal Imaging Syst. Eng, 7(4), 220–234 (2014)
Article Google Scholar
Z. Shi-Xiong, A. Ragni, M.J.F. Gales, Structured log linear models for noise robust speech recognition. IEEE Signal Proces. Lett. 17(11), 945–948 (2010)
Article Google Scholar
M.D. Skowronski, J.G. Harris, Noise-robust automatic speech recognition using a predictive echo state network. IEEE Trans. Audio Speech Lang. Proces. 15(5), 1724–1730 (2007)
Article MATH Google Scholar
M. Slaney, Auditory Toolbox—Version 2 Technical Report #1998-010 (Interval Research Corporation, CA, 2010)
E. Wegman, Hyperdimensional data analysis using parallel coordinates. J. Am. Stat. Assoc. 85, 664–675 (1990)
Article Google Scholar
L. Weifeng, W. Longbiao, Z. Yicong, H. Bourlard, L. Qingmin, Robust log-energy estimation and its dynamic change enhancement for in-car speech recognition. IEEE Trans. Audio Speech Lang. Proces. 21(8), 1689–1698 (2013)
Article Google Scholar
F. Weninger, M. Wollmer, J. Geiger, B. Schuller, J.F. Gemmeke, A. Hurmalainen, T. Virtanen, G. Rigoll, Non-negative matrix factorization for highly noise-robust asr: to enhance or to recognize? in 2012 IEEE International Conference on, Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 4681–4684
P.C. Wong, R.D. Bergeron, Thirty Years of Multidimensional Multivariate Visualization (IEEE Computer Society Press, Washington DC, 1997)

Download references

Acknowledgments

Authors are thankful to The University of the West Indies for providing necessary funding through Grant No. CRP.4.MAR11.4 to carry out research on the project “Development of Algorithms and Systems for Robust speech Recognition in the Noisy Environments.”

Author information

Authors and Affiliations

Department of Physics, The University of the West Indies, St. Augustine, Trinidad and Tobago
Jamin Atkins & Davinder Pal Sharma

Authors

Jamin Atkins
View author publications
You can also search for this author in PubMed Google Scholar
Davinder Pal Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davinder Pal Sharma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Atkins, J., Sharma, D.P. Visualization of Babble–Speech Interactions Using Andrews Curves. Circuits Syst Signal Process 35, 1313–1331 (2016). https://doi.org/10.1007/s00034-015-0123-4

Download citation

Received: 30 June 2014
Revised: 29 June 2015
Accepted: 30 June 2015
Published: 08 July 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s00034-015-0123-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Visualization of Babble–Speech Interactions Using Andrews Curves

Abstract

Access this article

Similar content being viewed by others

A Preliminary Study of Acoustic Events Classification with Factor Analysis in Meeting Rooms

Binaural Scene Analysis with Multidimensional Statistical Filters

Star DGT: a robust Gabor transform for speech denoising

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visualization of Babble–Speech Interactions Using Andrews Curves

Abstract

Access this article

Similar content being viewed by others

A Preliminary Study of Acoustic Events Classification with Factor Analysis in Meeting Rooms

Binaural Scene Analysis with Multidimensional Statistical Filters

Star DGT: a robust Gabor transform for speech denoising

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation