Skip to main content
Log in

Phoneme recognition using an adaptive supervised manifold learning algorithm

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

To effectively handle speech data lying on a nonlinear manifold embedded in a high-dimensional acoustic space, in this paper, an adaptive supervised manifold learning algorithm based on locally linear embedding (LLE) for nonlinear dimensionality reduction is proposed to extract the low-dimensional embedded data representations for phoneme recognition. The proposed method aims to make the interclass dissimilarity maximized, while the intraclass dissimilarity minimized in order to promote the discriminating power and generalization ability of the low-dimensional embedded data representations. The performance of the proposed method is compared with five well-known dimensionality reduction methods, i.e., principal component analysis, linear discriminant analysis, isometric mapping (Isomap), LLE as well as the original supervised LLE. Experimental results on three benchmarking speech databases, i.e., the Deterding database, the DARPA TIMIT database, and the ISOLET E-set database, demonstrate that the proposed method obtains promising performance on the phoneme recognition task, outperforming the other used methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Fanty M, Cole R (1990) Spoken letter recognition. In: Proceedings of neural information processing systems, Denver, pp 220–226

  2. Kim D, Lee S, Kil R (1999) Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans Speech Audio Process 7(1):55–69. doi:10.1109/89.736331

    Article  Google Scholar 

  3. Wang X, Paliwal KK (2003) Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern Recogn 36(10):2429–2439. doi:10.1016/S0031-3203(03)00044-X

    Article  MATH  Google Scholar 

  4. Gas B, Zarader J, Chavy C, Chetouani M (2004) Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 56:141–166. doi:10.1016/j.neucom.2002.08.001

    Article  Google Scholar 

  5. Kwon OW, Lee TW (2004) Phoneme recognition using ICA-based feature extraction and transformation. Signal Process 84:1005–1019. doi:10.1016/j.sigpro.2004.03.004

    Article  MATH  Google Scholar 

  6. Dharanipragada S, Yapanel U, Rao B (2007) Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method. IEEE Trans Audio Speech Lang Process 15(1):224–234. doi:10.1109/TASL.2006.876776

    Article  Google Scholar 

  7. Garau G, Renals S (2008) Combining spectral representations for large-vocabulary continuous speech recognition. IEEE Trans Audio Speech Lang Process 16(3):508–518. doi:10.1109/TASL.2008.916519

    Article  Google Scholar 

  8. Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752. doi:10.1121/1.399423

    Article  Google Scholar 

  9. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  10. Partridge M, Calvo R (1998) Fast dimensionality reduction and simple PCA. Intell Data Anal 2(3):292–298. doi:10.1.1.26.8709

    Google Scholar 

  11. Fukunaga K (1990) Introduction to statistical pattern recognition. Academic Press, Boston

    MATH  Google Scholar 

  12. Kocsor A, Toth L, Kuba A, Kovacs K, Jelasity M, Gyimothy T, Csirik J (2000) A comparative study of several feature transformation and learning methods for phoneme classification. Int J Speech Technol 3(3–4):263–276

    Article  MATH  Google Scholar 

  13. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326. doi:10.1126/science.290.5500.2323

    Article  Google Scholar 

  14. Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of nonlinear manifolds. J Mach Learn Res 4:119–155. doi:10.1162/153244304322972667

    MathSciNet  Google Scholar 

  15. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323. doi:10.1126/science.290.5500.2319

    Article  Google Scholar 

  16. Jain V, Saul LK (2004) Exploratory analysis and visualization of speech and music by locally linear embedding. In: Proceedings of 2004 IEEE international conference on acoustics, speech, and signal processing, Montreal, pp 984–987

  17. Jansen A, Niyogi P (2005) A geometric perspective on speech sounds. Technical report, TR-2005-08, University of Chicago

  18. Duraiswami R, Raykar VC (2005) The manifolds of spatial hearing. In: Proceedings of 2005 IEEE International conference on acoustics, speech, and signal processing, Philadelphia, pp 285–288

  19. Jansen A, Niyogi P (2006) Intrinsic Fourier analysis on the manifold of speech sounds. In: Proceedings of 2006 IEEE international conference on acoustics, speech, and signal processing, Toulouse, pp 241–244

  20. Errity A, McKenna J (2006) An investigation of manifold learning for speech analysis. In: Proceedings of 9th international conference on spoken language processing, Pittsburgh, pp 2506–2509

  21. Xu W, Lifang X, Dan Y, Zhiyan H (2008) Speech visualization based on locally linear embedding (LLE) for the hearing impaired. In: Proceedings of international conference on biomedical engineering and informatics, Sanya, Hainan, pp 502–505

  22. Tompkins F, Wolfe P (2009) Approximate intrinsic Fourier analysis of speech. In: Proceedings of Interspeech-2009, Brighton, United Kingdom, pp 120–123

  23. Kim J, Lee S, Narayanan S (2010) An exploratory study of manifolds of emotional speech. In: Proceedings of 2010 IEEE international conference on acoustics, speech, and signal processing, Dallas, Texas, USA, pp 5142–5145

  24. Mukherjee SN (2002) Locally linear embedding for speech recognition. Dissertation, Churchill College, University of Cambridge

  25. Errity A, McKenna J (2007) A comparative study of linear and nonlinear dimensionality reduction for speaker identification. In: Proceedings of 15th international conference on digital signal processing, Cardiff, Wales, pp 587–590

  26. Errity A, McKenna J, Kirkpatrick B (2007) Manifold learning-based feature transformation for phone classification. In: Proceedings of ISCA tutorial and research workshop, nonlinear speech processing, Paris, pp 132–141

  27. de Ridder D, Duin RPW (2002) Locally linear embedding for classification. Technical report PH-2002-01, Pattern Recognition Group, Department of Imaging Science & Technology, Delft University of Technology, Delft, The Netherlands

  28. de Ridder D, Kouropteva O, Okun O, Pietikäinen M, Duin RPW (2003) Supervised locally linear embedding. In: Proceedings of 13th international conference on artificial neural networks, Istanbul, Turkey, pp 333–341

  29. Kayo O (2006) Locally linear embedding algorithm extensions and applications. Dissertation, Faculty of Technology, University of Oulu

  30. Li B, Zheng CH, Huang DS (2008) Locally linear discriminant embedding: an efficient method for face recognition. Pattern Recogn 41(12):3813–3821. doi:10.1016/j.patcog.2008.05.027

    Article  MATH  Google Scholar 

  31. Li CG, Guo J (2006) Supervised Isomap with explicit mapping. In: Proceedings of 2006 international conference on innovative computing, information and control, Beijing, pp 345–348

  32. Chang H, Yeung DY (2006) Locally linear metric adaptation with application to semi-supervised clustering and image retrieval. Pattern Recogn 39(7):1253–1264. doi:10.1016/j.patcog.2005.12.012

    Article  MATH  Google Scholar 

  33. Kouropteva O, Okun O, Pietikäinen M (2003) Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine. In: Proceedings of 11th European symposium on artificial neural networks, Bruges, Belgium, pp 229–234

  34. Kouropteva O, Okun O, Pietikäinen M (2003) Supervised locally linear embedding algorithm for pattern recognition. In: Proceedings of the first Iberian conference on pattern recognition and image analysis, Mallorca, pp 386–394

  35. Liang D, Yang J, Zheng Z, Chang Y (2005) A facial expression recognition system based on supervised locally linear embedding. Pattern Recogn Lett 26(15):2374–2389. doi:10.1016/j.patrec.2005.04.011

    Article  Google Scholar 

  36. Wang M, Yang J, Xu ZJ, Chou KC (2005) SLLE for predicting membrane protein types. J Theor Biol 232:7–15. doi:10.1016/j.jtbi.2004.07.023

    Article  MathSciNet  Google Scholar 

  37. Pillati M, Viroli C (2005) Supervised locally linear embedding for classification: an application to gene expression data analysis. In: Zani S, Cerioli A (eds) Book of short papers, CLADAG2005, Parma, 6–8 Giugno, MUP, pp 147–150

  38. Bengio Y, Paiement JF, Vincent P (2004) Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering. In: Advances in neural information processing systems 16. MIT Press, Cambridge

  39. Platt J (2005) Fastmap, MetricMap, and Landmark MDS are all Nystrom algorithms. In: Proceedings of 10th international workshop on artificial intelligence and statistics, Barbados, pp 261–268

  40. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66. doi:10.1023/A:1022689900470

    Google Scholar 

  41. Deterding DH (1989) Speaker normalisation for automatic speech recognition. PhD thesis, Department of Engineering, University of Cambridge

  42. Garofalo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1990) The DARPA TIMIT Acoustic-phonetic continuous speech corpus CDROM.NIST

  43. Cole RA, Muthusamy Y, Fanty MA (1990) The ISOLET spoken letter database. Technical report 90-004, Computer Science Department, Oregon Graduate Institute

  44. Robinson A (1989) Dynamic error propagation networks. PhD thesis, Department of Engineering, University of Cambridge

  45. Lee K, Hon H (1989) Speaker-independent phoneme recognition using hidden Markov models. IEEE Trans Acoust Speech Signal Process 37(11):1641–1648

    Article  Google Scholar 

  46. Fanty M, Cole R, Roginski K (1992) English alphabet recognition with telephone speech. In: Advances in neural information processing systems 4. Springer, New York, pp 199–206

  47. Su KY, Lee CH (1994) Speech recognition using weighted HMM and subspace projection approaches. IEEE Trans Speech Audio Process 2(1):69–79. doi:10.1109/89.260336

    Article  Google Scholar 

  48. Loizou PC, Spanias AS (1996) High performance alphabet recognition. IEEE Trans Speech Audio Process 4(6):430–445. doi:10.1109/89.544528

    Article  Google Scholar 

  49. Fanty M, Cole R (1990) Speaker-independent English alphabet recognition: experiments with the e-set. In: Proceedings of the first international conference on spoken language processing, Kobe, pp 1361–1364

  50. Kocsor A, Tóth L (2004) Kernel-based feature extraction with a speech technology application. IEEE Trans Signal Process 52(8):2250–2263. doi:10.1109/TSP.2004.830995

    Article  MathSciNet  Google Scholar 

  51. Sainath T, Ramabhadran B, Nahamoo D, Kanevsky D, Sethy A (2010) Sparse Representation Features for Speech Recognition. In: Proceedings of Interspeech-2010, Makuhari, Chiba, Japan, pp 2254–2257

Download references

Acknowledgments

The authors would like to thank all the anonymous reviewers and editors for their helpful comments and suggestions about the improvement of this paper. This work is supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1101048 and No. Y1111058.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiqing Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Zhang, S. Phoneme recognition using an adaptive supervised manifold learning algorithm. Neural Comput & Applic 21, 1501–1515 (2012). https://doi.org/10.1007/s00521-012-1032-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-1032-0

Keywords

Navigation