Abstract
This paper describes the development of a Language Identification (LID) system oriented to robust Multilingual Speech Recognition in the Basque context where coexist three languages: Basque, Spanish and French. The LID system is integrated in GorUP, a Semantic Speech Recognition system for industrial complex environments described in Part I. The work presents hybrid strategies for LID, based on the selection of system elements by several classifiers (Support Vector Machines and Multilayer Perceptron) and Discriminant Analysis improved with robust regularized covariance matrix estimation methods oriented to under-resourced languages and stochastic methods for speech recognition tasks (Hidden Markov Models and n-grams). The LID tool manages the main elements of the Automatic Speech Recognition system (Acoustic Phonetic Decoder, Language Model and Lexicons).
Similar content being viewed by others
References
Ambikairajah, L., & Choi, E. (2005). Robust language identification based on fused phonotactic information with MLKSFM ICME. In IEEE international conference on pre-classifier, multimedia and expo.
Barroso, N., Ezeiza, A., Gilisagasti, N., Lopez de Ipiña, K., López, A., & López, J. M. (2007). Development of multimodal resources for multilingual information retrieval in the Basque context. In Proc. of Interspeech 2007, Antwerp, Belgium.
Barroso, N., Hernández, M., López de Ipiña, K., & Ezeiza, A. (2011a). Covariance matrix estimation methods. www.mathworks.com.
Barroso, N., López de Ipiña, K., Ezeiza, A., Hernández, C., Ezeiza, N., Barroso, O., Susperregi, U., & Barroso, S. (2011b). GorUp: an ontology-driven audio information retrieval system that suits the requirements of under-resourced languages. In INTERSPEECH, Florence, Italy.
Cosi, P. (2000). Hybrid HMM-NN architectures for connected digit recognition. In Proc. of the international joint conference on neural networks, Vol. 5.
Dau-Cheng, L., & Ren-Yuan, L. (2008). Language identification on code-switching utterances using multiple cues. In Proc. of Interspeech.
Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–175.
Ganapathiraju, A., Hmaker, J., & Picone, J. (2000). Hybrid SVM/HMM architectures for speech recognition. In Proc. of the international conference on spoken language processing (Vol. 4, pp. 504–507).
Hoffbeck, J. P., & Landgrebe, D. (1996). Covariance estimation and classification with limited training data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7), 763–767.
Le, V. B., & Besacier, L. (2009). Automatic speech recognition for under-resourced languages: application to Vietnamese language. IEEE Transactions on Audio, Speech, and Language Processing, 17(8), 1471–1482.
Li, H., & Ma, B. (2005). A phonotactic language model for spoken LID. In ACL 2005.
Lopez de Ipiña, K., Graña, M., Ezeiza, N., Hernández, M., Zulueta, E., Ezeiza, A., & Tovar, C. (2003). Selection of lexical units for CSR of Basque. In Progress in pattern recognition. speech and image analysis, LNCS (Vol. 2003, pp. 244–250). Berlin: Springer.
Ma, B., & Li, H. (2005). An acoustic segment modeling approach to automatic language identification. In Proc. Interspeech 2005, Lisbon, Portugal (pp. 2829–2832).
Martinez, A., & Kak, A. (2001). PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 228–233.
Matejka, P., Schwarz, P., Cernocky, J., & Chytil, P. (2005). Phonotactic LID using high quality phoneme recognition. In Proc. Interspeech 2005, Lisbon, Portugal (pp. 2237–2240).
Nagarajan, T., & Murthy, H. A. (2004). Language identification, using parallel syllable-like unit recognition. In Proc. ICASSP 2004 (Vol. I, pp. 401–404).
Padrell, J., Martín-Iglesias, D., & Díaz-de-María, F. (2006). Support vector machines for continuous speech recognition. In 14th European signal processing conference (BSSIPCO 2006), Florence, Italy, September 4–8.
Schultz, T., & Kirchhoff, N. (2006). Multilingual speech processing. Amsterdam: Elsevier.
Schultz, T., & Waibel, A. (1998). Multilingual and crosslingual speech recognition. In Proceedings of the DARPA broadcast news. Workshop.
Seng, S., Sam, S., Le, V. B., Bigi, B., & Besacier, L. (2008). Which units for acoustic and language modeling for Khmer automatic speech recognition. In 1st international conference on spoken language processing for under-resourced languages, Hanoi, Vietnam.
Smith, N., & Gales, M. (2002). Speech recognition using SVMs. Advances in neural information processing systems, Vol. 14. Cambridge: MIT Press.
Tadjudin, S., & Landgrebe, D. (1998). Classification of high dimensional data with limited training samples (Technical Report TRECE 98-8). School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana.
Tadjudin, S., & Landgrebe, D. (2000). Covariance estimation with limited training samples. IEEE Transactions on Geoscience and Remote Sensing, 37.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barroso, N., López de Ipiña, K., Hernández, C. et al. Semantic speech recognition in the Basque context Part II: language identification for under-resourced languages. Int J Speech Technol 15, 41–47 (2012). https://doi.org/10.1007/s10772-011-9114-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-011-9114-4