Abstract
Recognition of isolated spoken digits is the core procedure for a large and important number of applications mainly in telephone based services, such as dialing, airline reservation, bank transaction and price quotation, only using speech. Spoken digit recognition is generally a challenging task since the signals last for short period of time and often some digits are acoustically very similar to each other. The objective of this paper is to investigate the use of machine learning algorithms for digit recognition. We focus on the recognition of digits spoken in Portuguese. However, we note that our techniques are applicable to any language. We believe that the most important task for successfully recognizing spoken digits is the attribute extraction. Audio data is composed by a huge amount of very weak features, and most machine learning algorithms will not be able to build accurate classifiers. We show that Line Spectral Frequencies (LSF) provides a set of highly predictive coefficients for digit recognition. The results are superior than those obtained with state-of-the-art methods using Mel-Frequency Cepstrum Coefficients (MFCC) for digit recognition. In particular, we show that the choice of the right attribute extraction method is more important than the specific classification paradigm, and that the right combination of classifier and attributes can provide almost perfect accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abushariah, A., Gunawan, T., Khalifa, O., Abushariah, M.: English Digits Speech Recognition System Based on Hidden Markov Models. In: Intl. Conf. on Computer and Communication Engineering (ICCCE 2010), pp. 1–5. IEEE (2010)
Alotaibi, Y.: Investigating Spoken Arabic Digits in Sspeech Recognition Setting. Information Sciences 173(1), 115–139 (2005)
Azam, S., Mansoor, Z., Mughal, M., Mohsin, S.: Urdu Spoken Digits Recognition Using Classified MFCC and Backpropgation Neural Network. In: Computer Graphics, Imaging and Visualisation (CGIV 2007), pp. 414–418. IEEE (2007)
Bresolin, A.A., Neto, A.D.D., Alsina, P.J.: Digit Recognition Using Wavelet and SVM in Brazilian Portuguese. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 1545–1548. IEEE (2008)
Ghanty, S., Shaikh, S., Chaki, N.: On Recognition of Spoken Bengali Numerals. In: International Conference on Computer Information Systems and Industrial Management Applications (CISIM 2010), pp. 54–59. IEEE (2010)
Hu, X., Zhan, L., Xue, Y., Zhou, W., Zhang, L.: Spoken Arabic Digits Recognition Based on Wavelet Neural Networks. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC 2011), pp. 1481–1485. IEEE (2011)
Itakura, F.: Line Spectrum Representation of Linear Predictor Coefficients of Speech Signals. The Journal of the Acoustical Society of America 57, S35 (1975)
Kondo, K., Kamata, H., Ishida, Y.: Speaker-Independent Spoken Digits Recognition Using LVQ. In: IEEE World Congress on Computational Intelligence (WCCI 1994), vol. 7, pp. 4448–4451 (1994)
Kopparapu, S., Rao, P.: Enhancing Spoken Connected-Digit Recognition Accuracy by Error Correction Codes – A Novel Scheme. Sadhana 29(5), 559–571 (2004)
Markel, J., Gray, A.: Linear Prediction of Speech. Springer (1976)
Oppenheim, A., Schafer, R., Buck, J.: Discrete-Time Signal Processing. Prentice-Hall (1989)
Paliwal, K., Kleijn, W.: Quantization of LPC Parameters. In: Speech Coding and Synthesis, pp. 433–466. Elsevier (1995)
Panwar, M., Sharma, R., Khan, I., Farooq, O.: Design of Wavelet Based Features for Recognition of Hindi Digits. In: Intl. Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT 2011), pp. 232–235 (2011)
Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice-Hall (1978)
Rodrigues, F., Trancoso, I.: Digit Recognition Using the SPEECHDAT Corpus. In: 2nd Conference on Telecommunications (CONFETELE 1999), pp. 1–4 (1999)
Stevens, S.S., Volkmann, J., Newman, E.B.: A Scale for the Measurement of the Psychological Magnitude Pitch. Journal of the Acoustical Society of America 8(3), 185–190 (1937)
Watson, A.: Image Compression Using the Discrete Cosine Transform. Mathematica Journal 4(1), 81–88 (1994)
Zhen, B., Wu, X., Liu, Z., Chi, H.: On the Importance of Components of the MFCC in Speech and Speaker Recognition. In: 6th International Conference on Spoken Language Processing (ICSLP 2000), pp. 487–490 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Silva, D.F., de Souza, V.M.A., Batista, G.E.A.P.A., Giusti, R. (2012). Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds) Advances in Artificial Intelligence – IBERAMIA 2012. IBERAMIA 2012. Lecture Notes in Computer Science(), vol 7637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34654-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-34654-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34653-8
Online ISBN: 978-3-642-34654-5
eBook Packages: Computer ScienceComputer Science (R0)