Skip to main content
Log in

A noise robust speech features extraction approach in multidimensional cortical representation using multilinear principal component analysis

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we propose a new type of noise robust feature extraction method based on multidimensional perceptual representation of speech in the auditory cortex (AI). Different coded features in different dimensions cause an increase in discrimination power of the system. On the other hand, this representation causes a great increase in the volume of information that produces the curse of dimensionality phenomenon. In this study, we propose a second level feature extraction stage to make the features suitable and noise robust for classification training. In the second level of feature extraction, we target two main concerns: dimensionality reduction and noise robustness using singular value decomposition (SVD) approach. A multilinear principal component analysis framework based on higher-order SVD is proposed to extract the final features in high-dimensional AI output space. The phoneme classification results on different subsets of the phonemes of additive noise contaminated TIMIT database confirmed that the proposed method not only increased the classification rate considerably, but also enhanced the robustness significantly comparing to conventional Mel-frequency cepstral coefficient and cepstral mean normalization features, which were used to train in the same classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Acar, E., & Yener, B. (2009). Unsupervised multiway data analysis: A literature survey. IEEE; Transactions on Knowledge and Date Engineering, 21, 6–20.

    Article  Google Scholar 

  • Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer.

    MATH  Google Scholar 

  • Cabellos, J. M. G., Moreno, C. P., Antolin, A. G., Cruz, F. P., Maria, F.D. (2004). SVM classifiers for ASR: A discussion about parameterization. Proceedings of EUSIPCO.

  • Chi, T., Gao, Y., Guyton, C. G., Ru, P., & Shamma, S. (1999). Spectrotemporal modulation transfer functions and speech intelligibility. Journal of the Acoustical Society of America, 106, 2719–2732.

    Article  Google Scholar 

  • Chi, T., Ru, P., & Shamma, S. (2005). Multiresolution spectrotemporal analysis of complex sounds. Journal of the Acoustical Society of America, 118, 887–906.

    Article  Google Scholar 

  • DARPA TIMIT. (1990). Acoustic-Phonetic Continuous Speech Corpus. National Institute of Standards and Technology Speech.

  • Depireux, D. A., Simon, J. Z., Klein, D. J., & Shamma, S. A. (2001). Spectrotemporal response field characterization with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology, 85, 1220–1234.

    Google Scholar 

  • Doclo, S., & Moonen, M. (2002). GSVD-based optimal filtering for single and multiple speech enhancement. IEEE Transactions on Signal Processing, 50, 2230–2244.

    Article  Google Scholar 

  • Elhilali, M., & Shamma, S. (2008). A cocktail party with a cortical twist: How cortical mechanisms contribute to sound segregation. The Journal of the Acoustical Society of America, 124, 3751–3771.

    Article  Google Scholar 

  • Esfandian, N., Razzazi, F., & Behrad, A. (2012). A clustering based feature selection method in spectro-temporal domain for speech recognition. Engineering Applications of Artificial Intelligence, 25, 1194–1202.

    Article  Google Scholar 

  • Fartash, M., Setayeshi, S., & Razzazi, F. (2010). A novel spectro-temporal feature extraction method for phoneme classification. IEEE 10th International Conference on Signal Processing, (pp. 569–572). Beijing: IEEE.

  • Fartash, M., Setayeshi, S., & Razzazi, F. (2013). A scale-rate filter selection method in the spectro-temporal domain for phoneme classification. Computers & Electrical Engineering, 39, 1537–1548.

    Article  Google Scholar 

  • Gerbrands, J. J. (1981). On the relationships between SVD, KLT and PCA. Pattern Recognition, 14, 375–381.

    Article  MathSciNet  MATH  Google Scholar 

  • Hassanpour, H. (2008). A time-frequency approach for noise reduction. Digital Signal Processing, 18, 728–738.

    Article  Google Scholar 

  • He, Y., Gan, T., Chen, W., & Wang, H. (2011). Adaptive denoising by singular value decomposition. IEEE Signal Processing Letters, 18, 215–218.

    Article  Google Scholar 

  • Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2, 578–589.

    Article  Google Scholar 

  • Hermus, K., Wambacq, P., & Hamme, H. V. (2007). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Applied Signal Processing, 2007(1), 195.

    Google Scholar 

  • Hou, Z. (2003). Adaptive singular value decomposition in wavelet domain for image denoising. Pattern Recognition, 36, 1747–1763.

    Article  MATH  Google Scholar 

  • Hung, J. W., & Lee, L. S. (2006). Optimization of temporal filters for constructing robust features in speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 14, 808–832.

    Article  Google Scholar 

  • Hung, J. W., & Tsai, W. Y. (2008). Constructing modulation frequency domain-based features for robust speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 16, 563–577.

    Article  Google Scholar 

  • Jeon, W., & Juang, B. H. (2007). Speech analysis in a model of the central auditory system. IEEE Transactions on Audio, Speech, and Language Processing, 15, 1802–1817.

    Article  Google Scholar 

  • Jha, S. K., & Yadava, R. D. S. (2011). Denoising by singular value decomposition and its application to electronic nose data processing. IEEE Sensors Journal, 11, 35–44.

    Article  Google Scholar 

  • Kleinschmidt, M. (2003). Localized spectro-temporal features for automatic speech recognition. Proceeding of Eurospeech.

  • Kleinschmidt, M., & Gelbart, D. (2002). Improving word accuracy with gabor feature extraction. Proceedings of ICSLP.

  • Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51, 455–500.

    Article  MathSciNet  MATH  Google Scholar 

  • Kowalski, N., Depireux, D. A., & Shamma, S. (1996). Analysis of dynamic spectra in ferret primary auditory cortex I. Characteristics of single-unit response to moving ripple spectra. Journal of Neurophysiology, 76, 3503–3523.

    Google Scholar 

  • Kroonenberg, P., & Leeuw, J. (1980). Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika, 45, 69–97.

    Article  MathSciNet  MATH  Google Scholar 

  • Landgrebe, D. (2002). Hyperspectral image data analysis as a high dimensional signal processing problem. IEEE Signal Processing Magazine, 19, 17–28.

    Article  Google Scholar 

  • Lathauwer, L. D., Moor, B. D., & Vandewalle, J. (2000a). On the best rank-1 and rank-(r1, r2,…, rn) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21, 1324–1342.

    Article  MathSciNet  MATH  Google Scholar 

  • Lathauwer, L. D., Moor, B. D., & Vandewalle, J. (2000b). A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21, 1253–1278.

    Article  MathSciNet  MATH  Google Scholar 

  • Lathauwer, L. D., & Vandewalle, J. (2004). Dimensionality reduction in higher-order signal processing and rank-(R1; R2,…, RN) reduction in multilinear algebra. Linear Algebra and its Applications, 391, 31–55.

    Article  MathSciNet  MATH  Google Scholar 

  • Law, M. H. C., & Jain, A. K. (2006). Incremental nonlinear dimensionality reduction by manifold learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 377–391.

    Article  Google Scholar 

  • Li, J., Zhang, L., Tao, D., Sun, H., & Zhao, Q. (2009). A prior neurophysiologic knowledge free tensor-based scheme for single trial EEG classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 17, 107–115.

    Article  Google Scholar 

  • Linden, J. F., & Liu, R. C. (2003). M. Sahani. Spectrotemporal structure of receptive fields in areas AI and AAF of mouse auditory cortex. Journal of Neurophysiology, 90, 2660–2675.

    Article  Google Scholar 

  • Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communication, 22, 1–15.

    Article  Google Scholar 

  • Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2008). MPCA: Multilinearprincipal component analysis of tensor objects. IEEE Transactions on Neural Networks, 19, 18–39.

    Article  Google Scholar 

  • Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2009). Boosting discriminant learners for gait recognition using MPCA features. Journal on Image and Video Processing,. doi:10.1155/2009/713183.

    Google Scholar 

  • Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2011). A survey of multilinear subspace learning for tensor data. Pattern Recognition, 44, 1540–1551.

    Article  MATH  Google Scholar 

  • Lyon, R., & Shamma, S. (1996). Auditory representation of timbre and pitch. Auditory Computation (pp. 221–270). New York: Springer handbook of auditory research.

    Chapter  Google Scholar 

  • Maj, J. B., Royackers, L., Moonen, M., & Wouters, J. (2005). SVD-Based optimal filtering for noise reduction in dual microphone hearing aids: A real time implementation and perceptual evaluation. IEEE Transactions on Biomedical Engineering, 52, 1563–1573.

    Article  Google Scholar 

  • Maj, J. B., Wouters, J., & Moonen, M. (2002). SVD-based optimal filtering technique for noise redcution in hearing aids using two microphones. Journal on Applied Signal Processing, 4, 432–443.

    Article  Google Scholar 

  • Martínez, C. E., Goddard, J. C., Milone, D. H., & Rufiner, H. L. (2012). Bioinspired sparse spectro-temporal representation of speech for robust classification. Computer Speech & Language, 26, 336–348.

    Article  Google Scholar 

  • Mesgarani, N., David, S. V., Fritz, J. B., & Shamma, S. (2008). Phoneme representation and classification in primary auditory cortex. The Journal of the Acoustical Society of America, 123, 899–909.

    Article  Google Scholar 

  • Mesgarani, N., & Shamma, S. (2007). Denoising in the domain of spectrotemporal modulations. EURASIP Journal on Audio, Speech, and Music Processing, 2007, 3.

    Article  Google Scholar 

  • Mesgarani, N., Slaney, M., & Shamma, S. (2006). Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations. IEEE Transactions on Audio, Speech and Language Processing, 14, 920–930.

    Article  Google Scholar 

  • Meyer BT, Wächter M, Brand T, Kollmeier B. Phoneme confusions in human and automatic speech recognition. Proc Interspeech 2007. p. 1485-8.

  • Oseledets, I. V., & Tyrtyshnikov, E. E. (2009). Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM Journal on Scientific Computing, 31, 3744–3759.

    Article  MathSciNet  MATH  Google Scholar 

  • Panagakis, Y., Kotropoulos, C., & Arce, G. R. (2010). Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification. IEEE Transactions on Audio, Speech and Language Processing, 18, 576–588.

    Article  Google Scholar 

  • Renard, N., Bourennane, S., & Blanc-Talon, J. (2008). Denoising and dimensionality reduction using multilinear tools for hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing, 5, 138–142.

    Article  Google Scholar 

  • Safayani, M., & Shalmani, M. T. M. (2011). Three-dimensional modular discriminant analysis (3DMDA): A new feature extraction approach for face recognition. Computers & Electrical Engineering., 37, 811–823.

    Article  Google Scholar 

  • Shamma, S. (1998). Methods of neuronal modeling, in spatial and temporal processing in the auditory system (pp. 411–460). Cambridge: MIT Press.

    Google Scholar 

  • Simon, J., Depireux, D. A., & Shamma, S. (1998). Representation of complex spectra in auditory cortex. Proceedings of the 11th International Symposium on Hearing.

  • Tao, D., Li, X., Wu, X., & Maybak, J. S. (2007). General tensor discriminant analysis and gabor features for gait recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 1700–1715.

    Article  Google Scholar 

  • Theunissen, F. E., Sen, K., & Doupe, A. (2000). Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. Journal of Neuroscience, 20, 2315–2331.

    Google Scholar 

  • Varga, A., & Steeneken, H. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.

    Article  Google Scholar 

  • Viikki, O., & Laurila, K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication, 25, 133–147.

    Article  Google Scholar 

  • Wang, J., Barreto, A., Wang, L., Chen, Y., Rishe, N., Andrian, J., et al. (2010). Multilinear principal component analysis for face recognition with fewer features. Neurocomputing, 73, 1550–1555.

    Article  Google Scholar 

  • Wang, T. T., & Quatieri, T. F. (2010). High-Pitch formant estimation by exploiting temporal change of pitch. IEEE Transactions on Audio, Speech and Language Processing, 18, 1802–1817.

    Article  Google Scholar 

  • Wang, K., & Shamma, S. (1994). Self-normalization and noise-robustness in early auditory representations. IEEE Transactions on Speech and Audio Processing, 2, 421–435.

    Article  Google Scholar 

  • Weiland, S., & Belzen, F. V. (2010). Singular value decompositions and low rank approximations of tensors. IEEE Transactions on Signal Processing, 58, 1171–1182.

    Article  MathSciNet  Google Scholar 

  • Wongsawat, Y., Rao, K. R., & Oraintara, S. (2005). Multichannel SVD-based image denoising. Proceedings of the IEEE International Symposium Circuits and Systems. pp. 5990–5993.

  • Wu, Q., Zhang, L., & Shi, G. (2011). Robust multifactor speech feature extraction based on gabor analysis. IEEE Transactions on Audio, Speech and Language Processing, 19, 936–937.

    Google Scholar 

  • Yan, S., Xu, D., Yang, Q., Zhang, L., Tang, X., & Zhang, H.-J. (2007). Multilinear discriminant analysis for face recognition. IEEE Transactions on Image Processing, 16, 212–220.

    Article  MathSciNet  Google Scholar 

  • Yang, X., Wang, K., & Shamma, S. (1992). Auditory representation of acoustic signals. IEEE Transactions on Information Theory, 38, 824–839.

    Article  Google Scholar 

  • Yang, J., Zhang, D., Frangi, A. F., & Yang, J. (2004). Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 131–137.

    Article  Google Scholar 

  • Ye, J., Janardan, R., & Li, Q. (2004). GPCA: An efficient dimension reduction scheme for image compression and retrieval. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 354–363.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehdi Fartash.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fartash, M., Setayeshi, S. & Razzazi, F. A noise robust speech features extraction approach in multidimensional cortical representation using multilinear principal component analysis. Int J Speech Technol 18, 351–365 (2015). https://doi.org/10.1007/s10772-015-9274-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9274-8

Keywords

Navigation