Language Identification for Under-Resourced Languages in the Basque Context

Barroso, Nora; de Ipiña, Karmele López; Graña, Manuel; Ezeiza, Aitzol

doi:10.1007/978-3-642-19644-7_50

Nora Barroso⁸,
Karmele López de Ipiña⁹,
Manuel Graña⁹ &
…
Aitzol Ezeiza⁹

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 87))

1319 Accesses

Abstract

Automatic Speech Recognition (ASR) is a broad research area that absorbs many efforts from the research community. The interest on Multilingual Systems arouses in the Basque Country because there are three official languages (Basque, Spanish, and French), and there is much linguistic interaction among them, even if Basque has very different roots than the other two languages. The development of Multilingual Large Vocabulary Continuous Speech Recognition systems involves issues as: Language Identification, Acoustic Phonetic Decoding, Language Modeling or the development of appropriate Language Resources. This paper describes the development of a Language Identification (LID) system oriented to robust Multilingual Speech Recognition in the Basque context. The work presents hybrid strategies for LID, based on the selection of system elements by several classifiers and Discriminant Analysis improved with robust regularized covariance matrix estimation methods oriented to under-resourced languages and stochastic methods for speech recognition tasks (Hidden Markov Models and n-grams).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Schultz, T., Kirchhoff, N.: Multilingual Speech Processing. Elsevier, Amsterdam (2006)
Google Scholar
Schultz, T., Waibel, A.: Multilingual and Crosslingual Speech Recognition. In: Proceedings of the DARPA Broadcast News. Workshop (1998)
Google Scholar
Le, V.B., Besacier, L.: Automatic speech recognition for under-resourced languages: application to Vietnamese language. IEEE Transactions on Audio, Speech, and Language Processing 17(8), 1471–1482 (2009)
Article Google Scholar
Seng, S., Sam, S., Le, V.B., Bigi, B., Besacier, L.: Which Units For Acoustic and Language Mod-eling For Khmer Automatic Speech Recognition. In: 1st International Conference on Spoken Language Processing for Under-resourced languages Hanoi, Vietnam (2008)
Google Scholar
Lyu, D.-C., Lyu, R.-Y.: Language Identification on Code-Switching Utterances Using Multiple Cues. In: Proc of Interspeech (2008)
Google Scholar
Lopez de Ipiña, K., Graña, M., Ezeiza, N., Hernández, M., Zulueta, E., Ezeiza, A., Tovar, C.: Selection of Lexical Units for Continuous Speech Recognition of Basque. In: Sanfeliu, A., Ruiz-Shulcloper, J. (eds.) CIARP 2003. LNCS, vol. 2905, pp. 244–250. Springer, Heidelberg (2003)
Chapter Google Scholar
Barroso, N., Ezeiza, A., Gilisagasti, N., Lopez de Ipiña, K., López, A., López, J.M.: Development of Multimodal Resources for Multilingual Information Retrieval in the Basque context. In: Proccedings of Interspeech 2007, Antwerp, Belgium (2007)
Google Scholar
Li, H., Ma, B.: A Phonotactic Language Model for Spoken LID. In: ACL 2005 (2005)
Google Scholar
Ma, B., Li, H.: An Acoustic Segment Modeling Approach to Automatic Language Identification. In: Proc. Interspeech 2005, Lisbon, Portugal, pp. 2829–2832 (2005)
Google Scholar
Matejka, P., Schwarz, P., Cernocky, J., Chytil, P.: Phonotactic LID using High Quality Phoneme Recognition. In: Proc. Interspeech 2005, Lisbon, Portugal, pp. 2237–2240 (2005)
Google Scholar
Nagarajan, T., Murthy, H.A.: Language Identification, Using Parallel Syllable-Like Unit Recognition. In: Proc ICASSP 2004, pp. 401–404 (2004)
Google Scholar
Vandecatseye, A., Martens, J.P., Neto, J., Meinedo, H., Garcia-Mateo, C., Dieguez, F.J., Mihelic, F., Zibert, J., Nouza, J., David, P., Pleva, M., Cizmar, A., Papageorgiou, H., Alexandris, C.: The COST278 pan-European Broadcast News Database. In: Proceedings of LREC 2004, Lisbon, Portugal (2004)
Google Scholar
Wheatley, B., Kondo, K., Anderson, W., Muthusamy, Y.: An evaluation of Cross-Language Adaptation for Rapid HMM Development in a New Language. In: International Conference on Acoustics, Speech, and Signal Processing, Adelaine, pp. 237–240 (1994)
Google Scholar
Toledano, D., Moreno, A., Colás, J., Garrido, J.: Acoustic-phonetic decoding of different types of spontaneous speech in Spanish. In: Disfluencies in Spontaneous Speech 2005, Aix-en-Provence, France (2005)
Google Scholar
Padrell, J., Martín-Iglesias, D., Díaz-de-María, F.: Support Vector Machines for Continuous Speech Recognition. In: 14th European Signal Processing Conference (BSSIPCO 2006), Florence, Italy, September 4-8 (2006)
Google Scholar
Ganapathiraju, A., Hmaker, J., Picone, J.: Hybrid SVM/HMM architectures for speech recognition. In: Proc. of the International Conference on Spoken Language Processing, vol. 4, pp. 504–507 (2000)
Google Scholar
Smith, N., Gales, M.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)
Google Scholar
Cosi, P.: Hybrid HMM-NN architectures for connected digit recognition. In: Proc. of the International Joint Conference on Neural Networks, vol. 5 (2000)
Google Scholar
Friedman, J.H.: Regularized discriminant analysis. Journal of the American Statistical Association 84, 165–175 (1989)
Article MathSciNet Google Scholar
Martinez, A., Kak, A.: PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(2), 228–233 (2001)
Article Google Scholar
Hoffbeck, J.P., Landgrebe, D.: Covariance estimation and classification with limited training data. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(7), 763–767 (1996)
Article Google Scholar
Tadjudin, S., Landgrebe, D.: Classification of high dimensional data with limited training samples. Technical Report TRECE 98-8. School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana (1998)
Google Scholar
Tadjudin, S., Landgrebe, D.: Covariance Estimation with Limited Training Samples. IEEE Transaction on Geoscience and Remote Sensing 37 (2000); Appendix: Springer-Author Discount
Google Scholar
Ambikairajah, L., Choi, E.: Robust language identification based on fused phonotactic information with MLKSFM ICME. In: IEEE International Conference on pre-classifier, Multimedia and Expo. (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Irunweb Enterprise, Auzolan 2B – 2, Irun, 20303, Basque Country
Nora Barroso
Grupo de Inteligencia Computacional, UPV/EHU, Spain
Karmele López de Ipiña, Manuel Graña & Aitzol Ezeiza

Authors

Nora Barroso
View author publications
You can also search for this author in PubMed Google Scholar
Karmele López de Ipiña
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Graña
View author publications
You can also search for this author in PubMed Google Scholar
Aitzol Ezeiza
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad de Salamanca, Plaza de la Merced S/N, 37008, Salamanca, Spain
Emilio Corchado
VŠB-TU Ostrava, 17. listopadu 15, 70833, Ostrava, Czech Republic
Václav Snášel
University of Burgos, Avenida Cantaria S/N, 09006, Burgos, Spain
Javier Sedano
Cairo University, 5 Ahmed Zewal St., Orman, Cairo, Egypt
Aboul Ella Hassanien
University of La Coruña, Avda. 19 de Febrero, S/N, A Coruña,, 15403, Ferrol, Spain
José Luis Calvo
Infobright, 47 Colborne Street, Suite 403, M5E1P8, Toronto, Ontario, Canada
Dominik Ślȩzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barroso, N., de Ipiña, K.L., Graña, M., Ezeiza, A. (2011). Language Identification for Under-Resourced Languages in the Basque Context. In: Corchado, E., Snášel, V., Sedano, J., Hassanien, A.E., Calvo, J.L., Ślȩzak, D. (eds) Soft Computing Models in Industrial and Environmental Applications, 6th International Conference SOCO 2011. Advances in Intelligent and Soft Computing, vol 87. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19644-7_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-19644-7_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19643-0
Online ISBN: 978-3-642-19644-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics