Phoneme recognition using an adaptive supervised manifold learning algorithm

Zhao, Xiaoming; Zhang, Shiqing

doi:10.1007/s00521-012-1032-0

Phoneme recognition using an adaptive supervised manifold learning algorithm

Original Article
Published: 10 July 2012

Volume 21, pages 1501–1515, (2012)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xiaoming Zhao² &
Shiqing Zhang¹

307 Accesses
Explore all metrics

Abstract

To effectively handle speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space, in this paper, an adaptive supervised manifold learning algorithm based on locally linear embedding (LLE) for nonlinear dimensionality reduction is proposed to extract the low-dimensional embedded data representations for phoneme recognition. The proposed method aims to make the interclass dissimilarity maximized, while the intraclass dissimilarity minimized in order to promote the discriminating power and generalization ability of the low-dimensional embedded data representations. The performance of the proposed method is compared with five well-known dimensionality reduction methods, i.e., principal component analysis, linear discriminant analysis, isometric mapping (Isomap), LLE as well as the original supervised LLE. Experimental results on three benchmarking speech databases, i.e., the Deterding database, the DARPA TIMIT database, and the ISOLET E-set database, demonstrate that the proposed method obtains promising performance on the phoneme recognition task, outperforming the other used methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EMG-based speech recognition using dimensionality reduction methods

Article 23 May 2021

Nonlinear Manifold Classification Based on LLE

Direct Incorporation of $$L_1$$ -Regularization into Generalized Matrix Learning Vector Quantization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Fanty M, Cole R (1990) Spoken letter recognition. In: Proceedings of neural information processing systems, Denver, pp 220–226
Kim D, Lee S, Kil R (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Process 7(1):55–69. doi:10.1109/89.736331
Article Google Scholar
Wang X, Paliwal KK (2003) Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern Recogn 36(10):2429–2439. doi:10.1016/S0031-3203(03)00044-X
Article MATH Google Scholar
Gas B, Zarader J, Chavy C, Chetouani M (2004) Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 56:141–166. doi:10.1016/j.neucom.2002.08.001
Article Google Scholar
Kwon OW, Lee TW (2004) Phoneme recognition using ICA-based feature extraction and transformation. Signal Process 84:1005–1019. doi:10.1016/j.sigpro.2004.03.004
Article MATH Google Scholar
Dharanipragada S, Yapanel U, Rao B (2007) Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method. IEEE Trans Audio Speech Lang Process 15(1):224–234. doi:10.1109/TASL.2006.876776
Article Google Scholar
Garau G, Renals S (2008) Combining spectral representations for large-vocabulary continuous speech recognition. IEEE Trans Audio Speech Lang Process 16(3):508–518. doi:10.1109/TASL.2008.916519
Article Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752. doi:10.1121/1.399423
Article Google Scholar
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Partridge M, Calvo R (1998) Fast dimensionality reduction and simple PCA. Intell Data Anal 2(3):292–298. doi:10.1.1.26.8709
Google Scholar
Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, Boston
MATH Google Scholar
Kocsor A, Toth L, Kuba A, Kovacs K, Jelasity M, Gyimothy T, Csirik J (2000) A comparative study of several feature transformation and learning methods for phoneme classification. Int J Speech Technol 3(3–4):263–276
Article MATH Google Scholar
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. doi:10.1126/science.290.5500.2323
Article Google Scholar
Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of nonlinear manifolds. J Mach Learn Res 4:119–155. doi:10.1162/153244304322972667
MathSciNet Google Scholar
Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323. doi:10.1126/science.290.5500.2319
Article Google Scholar
Jain V, Saul LK (2004) Exploratory analysis and visualization of speech and music by locally linear embedding. In: Proceedings of 2004 IEEE international conference on acoustics, speech, and signal processing, Montreal, pp 984–987
Jansen A, Niyogi P (2005) A geometric perspective on speech sounds. Technical report, TR-2005-08, University of Chicago
Duraiswami R, Raykar VC (2005) The manifolds of spatial hearing. In: Proceedings of 2005 IEEE International conference on acoustics, speech, and signal processing, Philadelphia, pp 285–288
Jansen A, Niyogi P (2006) Intrinsic Fourier analysis on the manifold of speech sounds. In: Proceedings of 2006 IEEE international conference on acoustics, speech, and signal processing, Toulouse, pp 241–244
Errity A, McKenna J (2006) An investigation of manifold learning for speech analysis. In: Proceedings of 9th international conference on spoken language processing, Pittsburgh, pp 2506–2509
Xu W, Lifang X, Dan Y, Zhiyan H (2008) Speech visualization based on locally linear embedding (LLE) for the hearing impaired. In: Proceedings of international conference on biomedical engineering and informatics, Sanya, Hainan, pp 502–505
Tompkins F, Wolfe P (2009) Approximate intrinsic Fourier analysis of speech. In: Proceedings of Interspeech-2009, Brighton, United Kingdom, pp 120–123
Kim J, Lee S, Narayanan S (2010) An exploratory study of manifolds of emotional speech. In: Proceedings of 2010 IEEE international conference on acoustics, speech, and signal processing, Dallas, Texas, USA, pp 5142–5145
Mukherjee SN (2002) Locally linear embedding for speech recognition. Dissertation, Churchill College, University of Cambridge
Errity A, McKenna J (2007) A comparative study of linear and nonlinear dimensionality reduction for speaker identification. In: Proceedings of 15th international conference on digital signal processing, Cardiff, Wales, pp 587–590
Errity A, McKenna J, Kirkpatrick B (2007) Manifold learning-based feature transformation for phone classification. In: Proceedings of ISCA tutorial and research workshop, nonlinear speech processing, Paris, pp 132–141
de Ridder D, Duin RPW (2002) Locally linear embedding for classification. Technical report PH-2002-01, Pattern Recognition Group, Department of Imaging Science & Technology, Delft University of Technology, Delft, The Netherlands
de Ridder D, Kouropteva O, Okun O, Pietikäinen M, Duin RPW (2003) Supervised locally linear embedding. In: Proceedings of 13th international conference on artificial neural networks, Istanbul, Turkey, pp 333–341
Kayo O (2006) Locally linear embedding algorithm extensions and applications. Dissertation, Faculty of Technology, University of Oulu
Li B, Zheng CH, Huang DS (2008) Locally linear discriminant embedding: an efficient method for face recognition. Pattern Recogn 41(12):3813–3821. doi:10.1016/j.patcog.2008.05.027
Article MATH Google Scholar
Li CG, Guo J (2006) Supervised Isomap with explicit mapping. In: Proceedings of 2006 international conference on innovative computing, information and control, Beijing, pp 345–348
Chang H, Yeung DY (2006) Locally linear metric adaptation with application to semi-supervised clustering and image retrieval. Pattern Recogn 39(7):1253–1264. doi:10.1016/j.patcog.2005.12.012
Article MATH Google Scholar
Kouropteva O, Okun O, Pietikäinen M (2003) Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine. In: Proceedings of 11th European symposium on artificial neural networks, Bruges, Belgium, pp 229–234
Kouropteva O, Okun O, Pietikäinen M (2003) Supervised locally linear embedding algorithm for pattern recognition. In: Proceedings of the first Iberian conference on pattern recognition and image analysis, Mallorca, pp 386–394
Liang D, Yang J, Zheng Z, Chang Y (2005) A facial expression recognition system based on supervised locally linear embedding. Pattern Recogn Lett 26(15):2374–2389. doi:10.1016/j.patrec.2005.04.011
Article Google Scholar
Wang M, Yang J, Xu ZJ, Chou KC (2005) SLLE for predicting membrane protein types. J Theor Biol 232:7–15. doi:10.1016/j.jtbi.2004.07.023
Article MathSciNet Google Scholar
Pillati M, Viroli C (2005) Supervised locally linear embedding for classification: an application to gene expression data analysis. In: Zani S, Cerioli A (eds) Book of short papers, CLADAG2005, Parma, 6–8 Giugno, MUP, pp 147–150
Bengio Y, Paiement JF, Vincent P (2004) Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering. In: Advances in neural information processing systems 16. MIT Press, Cambridge
Platt J (2005) Fastmap, MetricMap, and Landmark MDS are all Nystrom algorithms. In: Proceedings of 10th international workshop on artificial intelligence and statistics, Barbados, pp 261–268
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66. doi:10.1023/A:1022689900470
Google Scholar
Deterding DH (1989) Speaker normalisation for automatic speech recognition. PhD thesis, Department of Engineering, University of Cambridge
Garofalo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1990) The DARPA TIMIT Acoustic-phonetic continuous speech corpus CDROM.NIST
Cole RA, Muthusamy Y, Fanty MA (1990) The ISOLET spoken letter database. Technical report 90-004, Computer Science Department, Oregon Graduate Institute
Robinson A (1989) Dynamic error propagation networks. PhD thesis, Department of Engineering, University of Cambridge
Lee K, Hon H (1989) Speaker-independent phoneme recognition using hidden Markov models. IEEE Trans Acoust Speech Signal Process 37(11):1641–1648
Article Google Scholar
Fanty M, Cole R, Roginski K (1992) English alphabet recognition with telephone speech. In: Advances in neural information processing systems 4. Springer, New York, pp 199–206
Su KY, Lee CH (1994) Speech recognition using weighted HMM and subspace projection approaches. IEEE Trans Speech Audio Process 2(1):69–79. doi:10.1109/89.260336
Article Google Scholar
Loizou PC, Spanias AS (1996) High performance alphabet recognition. IEEE Trans Speech Audio Process 4(6):430–445. doi:10.1109/89.544528
Article Google Scholar
Fanty M, Cole R (1990) Speaker-independent English alphabet recognition: experiments with the e-set. In: Proceedings of the first international conference on spoken language processing, Kobe, pp 1361–1364
Kocsor A, Tóth L (2004) Kernel-based feature extraction with a speech technology application. IEEE Trans Signal Process 52(8):2250–2263. doi:10.1109/TSP.2004.830995
Article MathSciNet Google Scholar
Sainath T, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2010) Sparse Representation Features for Speech Recognition. In: Proceedings of Interspeech-2010, Makuhari, Chiba, Japan, pp 2254–2257

Download references

Acknowledgments

The authors would like to thank all the anonymous reviewers and editors for their helpful comments and suggestions about the improvement of this paper. This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1101048 and No. Y1111058.

Author information

Authors and Affiliations

School of Physics and Electronic Engineering, Taizhou University, Taizhou, 318000, People’s Republic of China
Shiqing Zhang
Department of Computer Science, Taizhou University, Taizhou, 318000, People’s Republic of China
Xiaoming Zhao

Authors

Xiaoming Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Shiqing Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shiqing Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Zhang, S. Phoneme recognition using an adaptive supervised manifold learning algorithm. Neural Comput & Applic 21, 1501–1515 (2012). https://doi.org/10.1007/s00521-012-1032-0

Download citation

Received: 14 October 2009
Accepted: 21 June 2012
Published: 10 July 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s00521-012-1032-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phoneme recognition using an adaptive supervised manifold learning algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

EMG-based speech recognition using dimensionality reduction methods

Nonlinear Manifold Classification Based on LLE

Direct Incorporation of $$L_1$$ -Regularization into Generalized Matrix Learning Vector Quantization

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now