Abstract
This paper introduces an accurate time–domain approach to model and classify the Malayalam consonant-Vowel (CV) speech unit waveforms. The technique is based on statistical models of Reconstructed State Space (RSS). A feature extraction method using RSS based State Space Point Distribution (SSPD) parameters are studied. The results of the simulation experiment performed on the Malayalam CV speech databases using Artificial Neural Network (ANN) and k-Nearest Neighborhood (k-NN) classifiers are also presented. The results indicate that the efficiency of the RSS approach is capable of increasing speaker independent consonant speech recognition accuracy.
Similar content being viewed by others
References
Aiyar, S. (1987). Dravidian theories, p. 286.
Anitha, R., Srikrishna Satish, D., & Chandra Shekhar, C. (2004). Outerproduct of trajectory matrix for acoustic modelling using support vector machines. In IEEE workshop on machine learning for signal processing (pp. 355–363).
Baker, G. L., & Gollub, J. (1996). Chaotic dynamics: An introduction. Cambridge: Cambridge University Press.
Banbrook, M., & McLaughlin, S. (1994). Is speech chaotic? In Proceedings. IEE colloq. exploiting chaos in signal processing (pp. 1–8).
Broomhead, D. S., & King, G. P. (1986). Extracting qualitative dynamics from experimental data. Physica D, 217–236.
Casdagli, M. (1991). Chaos and deterministic versus stochastic nonlinear modeling. Journal of the Royal Statistical Society. Series B, 54, 303–328.
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27.
Cutajar, M., Gatt, E., Grech, I., Casha, O., & Micallef, J. (2011). Neural network architectures for speaker independent phoneme recognition. In 7th international symposium on image and signal processing analysis, Croatia (pp. 90–95).
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley.
Duda, R. O., Hart, P. E., & Stork, D. G. (2006). Pattern classification. New York: Wiley.
Friedmen, M., & Kandel, A. (1999). Introduction to pattern recognition: Statistical, structural, neural and fuzzy logic approach. Singapore: World Scientific.
Govindaraju, V., & Setlur, S. (2009). Advances in pattern recognition. Guide to OCR for Indic scripts: Document recognition and retrieval. Berlin: Springer. (p. 126).
Hand, D. J. (1981). Discrimination and classification. New York: Wiley.
Haykin, S. (2004). Neural networks: A comprehensive foundation. New Delhi: Prentice Hall of India Pvt. Ltd.
Johnson, M. T., Povinalli, R. J., Lindgren, A. C., Ye, J., Liu, X., & Indrebo, K. (2005). Time domain isolated phoneme classification using reconstructed phase space. IEEE Transactions on Speech and Audio Processing, 13(4), 458–466.
Jurafsky, D., & Martin, J. H. (2004). An introduction to natural language processing, computational linguistics, and speech recognition. Upper Saddle River: Pearson Education.
Kantz, H., & Schreiber, T. (1997). Non linear time series analysis. Cambridge: Cambridge University Press.
Kohonen, T. (1988). An introduction to neural computing. Neural Networks.
Kwon, O.-W., Chan, K., & Lee, T.-W. (2003). Speech feature analysis using variational Bayesian PCA. IEEE Signal Processing Letters, 10, 5.
Ladefoged, P. (2004). Vowels and consonants—an introduction to the sounds of language. Oxford: Blackwell.
Lajish, V. L. (2007). Adaptive neuro-fuzzy inference based pattern recognition studies on handwritten character images. PhD Thesis, University of Calicut.
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE Transactions on Acoustic, Speech, and Signal Processing Magazine, 61, 4–22.
McCullough, W. C., & Pitts, W. H. (1943). A logical calculus of ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5, 115–133.
Narayanan, N. K., & Kabeer, V. (2010). Face recognition using non-linear feature parameter and artificial neural network. International Journal of Computational Intelligent Systems, 3(5), 566–574.
Ott, E. (1993). Chaos in dynamical systems. Cambridge: Cambridge University Press.
Packard, N. H., Crutchfield, J. P., Farmer, J. D., & Shaw, R. S. (1980). Geometry from a time series. Physical Review Letters, 45, 712–716.
Pal, S. K., & Mitra, S. (1992). Multilayer perceptron, fuzzy sets, and classification. IEEE Transactions on Neural Networks, 3(5), 683–697.
Patil, H. A., & Basu, T. K. (2008). LP spectra vs. mel spectra for identification of professional mimics in Indian languages. International Journal of Speech Technology, 11, 1–16.
Pernkopf, F. (2005). Bayesian network classifiers versus selective k-NN classifier. Pattern Recognition, 38, 1–10.
Prajith, P. (2008). Investigations on the applications of dynamical instabilities and deterministic chaos for speech signal processing. PhD Thesis, University of Calicut.
Rabiner, L., & Juang, B. (1992). Fundamentals of speech recognition. Upper Saddle River: Pearson Education.
Ramachandran, H. P. (2008). Encyclopedia of language and linguistics. Oxford: Pergamon Press.
Ray, A. K., & Chatterjee, B. (1984). Design of a nearest neighbor classifier system for Bengali character recognition. Journal of the Institution of Electronics and Telecommunication Engineers, 30, 226–229.
Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: Cambridge University Press.
Samouelian, A. (1994). Knowledge based approach to consonant recognition. In IEEE international conf. on ASSP (pp. 77–80).
Senthil, R. G., & Dandapt, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13, 141–161.
Sheikhzadeh, H., & Deng, L. (1994). Waveform-based speech recognition using hidden filter models: parameter selection and sensitivity to power normalization. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2, 80–91.
Simpson, P. K. (1990). Artificial neural systems. Oxford: Pergamon.
Takens, F. (1980). Detecting strange attractors in turbulence. In Proceedings. Dynamical systems and turbulence (pp. 366–381), Warwick, UK.
Teager, H. M., & Teager, S. M. (1990). Evidence for nonlinear sound production mechanisms in the vocal tract. In Proceedings NATO ASI speech production speech modeling (pp. 241–261).
Tou, J. T., & Gonzalez, R. C. (1974). Pattern recognition principles. London: Addison-Wesley.
Whitney, H. (1936). Differentiable manifolds. Annals of Mathematics, 37, 645–680.
Yu, M.-C. (2011). Multi-criteria ABC analysis using artificial-intelligence based classification techniques. Elsevier Expert Systems with Applications, 38, 3416–3421.
Zhang, B. Srihari, S. N. (2004). Fast k-nearest neighbor using cluster based trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(4), 525–528.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thasleema, T.M., Prajith, P. & Narayanan, N.K. Time–domain non-linear feature parameter for consonant classification. Int J Speech Technol 15, 227–239 (2012). https://doi.org/10.1007/s10772-012-9136-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-012-9136-6