Abstract
Automatic Speech Recognition (ASR) is a broad research area that absorbs many efforts from the research community. The interest on Multilingual Systems arouses in the Basque Country because there are three official languages (Basque, Spanish, and French), and there is much linguistic interaction among them, even if Basque has very different roots than the other two languages. The development of Multilingual Large Vocabulary Continuous Speech Recognition systems involves issues as: Language Identification, Acoustic Phonetic Decoding, Language Modeling or the development of appropriate Language Resources. This paper describes the development of a Language Identification (LID) system oriented to robust Multilingual Speech Recognition in the Basque context. The work presents hybrid strategies for LID, based on the selection of system elements by several classifiers and Discriminant Analysis improved with robust regularized covariance matrix estimation methods oriented to under-resourced languages and stochastic methods for speech recognition tasks (Hidden Markov Models and n-grams).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Schultz, T., Kirchhoff, N.: Multilingual Speech Processing. Elsevier, Amsterdam (2006)
Schultz, T., Waibel, A.: Multilingual and Crosslingual Speech Recognition. In: Proceedings of the DARPA Broadcast News. Workshop (1998)
Le, V.B., Besacier, L.: Automatic speech recognition for under-resourced languages: application to Vietnamese language. IEEE Transactions on Audio, Speech, and Language Processing 17(8), 1471–1482 (2009)
Seng, S., Sam, S., Le, V.B., Bigi, B., Besacier, L.: Which Units For Acoustic and Language Mod-eling For Khmer Automatic Speech Recognition. In: 1st International Conference on Spoken Language Processing for Under-resourced languages Hanoi, Vietnam (2008)
Lyu, D.-C., Lyu, R.-Y.: Language Identification on Code-Switching Utterances Using Multiple Cues. In: Proc of Interspeech (2008)
Lopez de Ipiña, K., Graña, M., Ezeiza, N., Hernández, M., Zulueta, E., Ezeiza, A., Tovar, C.: Selection of Lexical Units for Continuous Speech Recognition of Basque. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 244–250. Springer, Heidelberg (2003)
Barroso, N., Ezeiza, A., Gilisagasti, N., Lopez de Ipiña, K., López, A., López, J.M.: Development of Multimodal Resources for Multilingual Information Retrieval in the Basque context. In: Proccedings of Interspeech 2007, Antwerp, Belgium (2007)
Li, H., Ma, B.: A Phonotactic Language Model for Spoken LID. In: ACL 2005 (2005)
Ma, B., Li, H.: An Acoustic Segment Modeling Approach to Automatic Language Identification. In: Proc. Interspeech 2005, Lisbon, Portugal, pp. 2829–2832 (2005)
Matejka, P., Schwarz, P., Cernocky, J., Chytil, P.: Phonotactic LID using High Quality Phoneme Recognition. In: Proc. Interspeech 2005, Lisbon, Portugal, pp. 2237–2240 (2005)
Nagarajan, T., Murthy, H.A.: Language Identification, Using Parallel Syllable-Like Unit Recognition. In: Proc ICASSP 2004, pp. 401–404 (2004)
Vandecatseye, A., Martens, J.P., Neto, J., Meinedo, H., Garcia-Mateo, C., Dieguez, F.J., Mihelic, F., Zibert, J., Nouza, J., David, P., Pleva, M., Cizmar, A., Papageorgiou, H., Alexandris, C.: The COST278 pan-European Broadcast News Database. In: Proceedings of LREC 2004, Lisbon, Portugal (2004)
Wheatley, B., Kondo, K., Anderson, W., Muthusamy, Y.: An evaluation of Cross-Language Adaptation for Rapid HMM Development in a New Language. In: International Conference on Acoustics, Speech, and Signal Processing, Adelaine, pp. 237–240 (1994)
Toledano, D., Moreno, A., Colás, J., Garrido, J.: Acoustic-phonetic decoding of different types of spontaneous speech in Spanish. In: Disfluencies in Spontaneous Speech 2005, Aix-en-Provence, France (2005)
Padrell, J., Martín-Iglesias, D., Díaz-de-María, F.: Support Vector Machines for Continuous Speech Recognition. In: 14th European Signal Processing Conference (BSSIPCO 2006), Florence, Italy, September 4-8 (2006)
Ganapathiraju, A., Hmaker, J., Picone, J.: Hybrid SVM/HMM architectures for speech recognition. In: Proc. of the International Conference on Spoken Language Processing, vol. 4, pp. 504–507 (2000)
Smith, N., Gales, M.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)
Cosi, P.: Hybrid HMM-NN architectures for connected digit recognition. In: Proc. of the International Joint Conference on Neural Networks, vol. 5 (2000)
Friedman, J.H.: Regularized discriminant analysis. Journal of the American Statistical Association 84, 165–175 (1989)
Martinez, A., Kak, A.: PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(2), 228–233 (2001)
Hoffbeck, J.P., Landgrebe, D.: Covariance estimation and classification with limited training data. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(7), 763–767 (1996)
Tadjudin, S., Landgrebe, D.: Classification of high dimensional data with limited training samples. Technical Report TRECE 98-8. School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana (1998)
Tadjudin, S., Landgrebe, D.: Covariance Estimation with Limited Training Samples. IEEE Transaction on Geoscience and Remote Sensing 37 (2000); Appendix: Springer-Author Discount
Ambikairajah, L., Choi, E.: Robust language identification based on fused phonotactic information with MLKSFM ICME. In: IEEE International Conference on pre-classifier, Multimedia and Expo. (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barroso, N., de Ipiña, K.L., Graña, M., Ezeiza, A. (2011). Language Identification for Under-Resourced Languages in the Basque Context. In: Corchado, E., Snášel, V., Sedano, J., Hassanien, A.E., Calvo, J.L., Ślȩzak, D. (eds) Soft Computing Models in Industrial and Environmental Applications, 6th International Conference SOCO 2011. Advances in Intelligent and Soft Computing, vol 87. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19644-7_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-19644-7_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19643-0
Online ISBN: 978-3-642-19644-7
eBook Packages: EngineeringEngineering (R0)