Language Identification Based on Phone Decoding for Basque and Spanish

Guijarrubia, Víctor G.; Torres, M. Inés

doi:10.1007/978-3-540-72847-4_31

Language Identification Based on Phone Decoding for Basque and Spanish

Víctor G. Guijarrubia¹ &
M. Inés Torres¹

Conference paper

1539 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4477))

Abstract

This paper presents some experiments in language identification for Spanish and Basque, both official languages in the Basque Country in the North of Spain. We focus on four methods based on phone decoding, some of which make use of phonotactic knowledge. We run also a comparison between the use of a generic and a task-specific phonotactic model. Despite initial poor performances, significant accuracies are achieved when better phonotactic knowledge is used. The use of a task-specific phonotactic model performs slightly better, but it is only useful when using less expensive methods. Finally, we present a temporal evolution of the accuracies. Results show that 5-6 seconds are enough to achieve similar percentage of correctly classified utterances.

This work was partially supported by the CICYT project TIN2005-08660-C04-03 and by the University of the Basque Country under grant 9/UPV 00224.310-15900/2004.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Itakahashi, S., Du, L.: Language identification based on speech fundamental frequency. In: Proceedings of the EUROSPEECH, Madrid, Spain, vol. 2, pp. 1359–1362 (1995)
Google Scholar
Mary, L., Rao, K.S., Yegnanarayana, B.: Neural network classifiers for language identification using phonotactic and prosodic features. In: Proceedings of International Conference on Intelligent Sensing and Information Processing, Chennai, India, pp. 404–408 (2005)
Google Scholar
Zissman, M.A., Singer, E.: Automatic language identification of telephone speech messages using phoneme recognition and n-gram modelling. In: Proceedings of ICASSP, Adelaide, Australia, vol. 1, pp. 305–308 (1994)
Google Scholar
Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell, W.M., Reynolds, D.A.: Acoustic, phonetic and discriminative approaches to automatic language identification. In: Proceedings of the EUROSPEECH, Geneva, Switzerland, pp. 1349–1352 (2003)
Google Scholar
Navrátil, J., Zühlke, W.: An efficient phonotactic-acoustic system for language identification. In: Proceedings of the ICASSP, Seattle, USA, vol. 2, pp. 781–784 (1998)
Google Scholar
Schultz, T., Rogina, I., Waibel, A.: Lvcsr-based language identification. In: Proceedings of the ICASSP, Atlanta, USA, pp. 781–784 (1996)
Google Scholar
Metze, F., Kemp, T., Schaaf, T., Schultz, T., Soltau, H.: Confidence measure based language identification. In: Proceedings of the ICASSP, Istanbul, Turkey (2000)
Google Scholar
Young, S.R.: Detecting misrecognitions and out-of-vocabulary words. In: Proceedings of the ICASSP, Adelaide, Australia, vol. 2, pp. 21–24 (1994)
Google Scholar
Hieronymus, J.L., Kadambe, S.: Spoken language identification using large vocabulary speech recognition. In: Proceedings of International Conference of Spoken Language Processing, Philadelphia, USA, pp. 1780–1783 (1996)
Google Scholar
Guijarrubia, V., Torres, I., Rodríguez, L.: Evaluation of a spoken phonetic database in basque language. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, vol. 6, pp. 2127–2130 (2004)
Google Scholar
Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: Design of the phonetic corpus. In: Proceedings of the EUROSPEECH, Lisbon (1993)
Google Scholar
Pérez, A., Torres, I., Casacuberta, F., Guijarrubia, V.: A Spanish-Basque weather forecast corpus for probabilistic speech translation. In: 5th SALTMIL Workshop on Minority Languages, Genoa, Italy, pp. 99–101 (2006)
Google Scholar
Torres, I., Varona, A.: k-tss language model in a speech recognition system. Computer Speech and Language 15(2), 127–149 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Electricidad y Electrónica, Universidad del Paí-s Vasco, Apartado 644, 48080 Bilbao, Spain
Víctor G. Guijarrubia & M. Inés Torres

Authors

Víctor G. Guijarrubia
View author publications
You can also search for this author in PubMed Google Scholar
M. Inés Torres
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Joan Martí José Miguel Benedí Ana Maria Mendonça Joan Serrat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guijarrubia, V.G., Torres, M.I. (2007). Language Identification Based on Phone Decoding for Basque and Spanish. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2007. Lecture Notes in Computer Science, vol 4477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72847-4_31

Download citation

DOI: https://doi.org/10.1007/978-3-540-72847-4_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72846-7
Online ISBN: 978-3-540-72847-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics