On the Use of Phonotactic Vector Representations with FastText for Language Identification

Romero, David; Salamea, Christian

doi:10.1007/978-981-15-8395-7_25

David Romero³⁷ &
Christian Salamea^37,38

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 704))

797 Accesses
1 Citations

Abstract

This paper explores a better way to learn word vector representations for language identification (LID). We have focused on a phonotactic approach using phoneme sequences in order to make phonotactic units (phone-grams) to incorporate context information. In order to take into consideration the morphology of phone-grams, we have considered the use of sub-word information (lower-order n-grams) to learn phone-grams embeddings using FastText. These embeddings are used as input to an i-Vector framework to train a multiclass logistic classifier. Our approach has been compared with a LID system that uses phone-gram embeddings learned through Skipgram that do not implement sub-word information, using Cavg as a metric for our experiments. Our approach to LID to incorporate sub-word information in phone-grams embeddings significantly improves the results obtained by using embeddings that are learned ignoring the structure of phone-grams. Furthermore, we have shown that our system provides complementary information to an acoustic system, improving it through the fusion of both systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ace P, Schwarz P, Ace V (2009) Phoneme recognition based on long temporal context
Google Scholar
Barbaresi A (2017) Discriminating between similar languages using weighted subword features. In: Fourth workshop on NLP for similar languages, pp 184–189
Google Scholar
Berkling K, Arai T, Barnard E (1994) Analysis of phoneme-based features for language identification. In: Proceedings of the international conference on acoustics, speech and signal processing. IEEE, pp 289–292
Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. arXiv:1607.04606v2
Chaudhary A, Zhou C, Levin L, Neubig G, Mortensen D, Carbonell J (2018) Adapting word embeddings to new languages with morphological and phonological subword representations. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3285–3295
Google Scholar
D’Haro L, Glembek O, Plchot O, Matejka P, Soufifar M, Córdoba R, Cernocky J (2012) Phonotactic language recognition using i-vectors and phoneme posteriogram counts. In: ISCA 13th annual conference, Proceedings of the INTERSPEECH, pp 42–45
Google Scholar
Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image description. IEEE Trans Pattern Anal Mach Intell 39(4):664–676
Article Google Scholar
Kulmizev A, Blankers B, Bjerva J, Nissim M, Noord G, Plank B, Wieling M (2017) The power of character n-grams in native language identification. In: Proceedings of the 12th workshop on innovative use of NLP for building educational applications, pp 382–389
Google Scholar
Livescu K, Fosler-Lussier E, Metze F (2012) Sub-word modeling for automatic speech recognition. IEEE Signal Process Mag 29:44–57
Article Google Scholar
Matejka P, Schwarz P, Cernock J, Chytil P (2005) Phonotactic language identification using high quality phonome recognition. In: Proceedings of the IberSPEECH, pp 2237–2240
Google Scholar
Martin A, Greenberg C (2010) The 2009 NIST language recognition evaluation. In: Odyssey, p 30
Google Scholar
Mager M, Cetinoglu O, Kann K (2019) Subword-level language identification for intra-word code-switching. arXiv:1904.01989v1
Mikolov T, Sutskever I, Deoras A, Le H, Kombrink S, Cernocky J (2011) Subword language modeling with neural networks
Google Scholar
Palaskar S, Raunak V, Metze F (2019) Learned in speech recognition: contextual acoustic word embeddings. arXiv:1902.06833v1
Qi Y, Sachan D, Felix M, Padmanabhan S, Neubig G (2018) When and why are pre-trained word embeddings useful for neural machine translation? arXiv:1804.06323v2
Rodriguez L, Penagarikano M, Varona A, Diez M, Bordel G (2016) KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios. Lang Resour Eval 50(2):221–243
Article Google Scholar
Salamea C, Córdoba R, D’Haro L, Segundo R, Ferreiros J (2018) On the use of phone-based embeddings for language recognition. In: Proceedings of the IberSPEECH, pp 55–59
Google Scholar
Singh R, Raj B, Stern R (2002) Automatic generation of subword units for speech recognition systems. IEEE Trans Speech Audio Process 10(2):89–99
Article Google Scholar
Xia M (2016) Codeswitching language identification using subword information enriched word vectors. In: Proceedings of the second workshop on computational approaches to code switching, pp 132–136
Google Scholar
Zhang Z, Huang Y, Zhu P, Zhao H (2018) Effective character-augmented word embedding for machine reading comprehension. arXiv:1808.02772v1

Download references

Author information

Authors and Affiliations

Interaction, Robotics and Automation Research Group, Universidad Politécnica Salesiana, Calle Vieja 12-30 y Elia Liut, Cuenca, Ecuador
David Romero & Christian Salamea
Speech Technology Group, Information and Telecommunication Center, Universidad Politécnica de Madrid, Av-Complutense 30, 28040, Madrid, Spain
Christian Salamea

Authors

David Romero
View author publications
You can also search for this author in PubMed Google Scholar
Christian Salamea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Romero .

Editor information

Editors and Affiliations

Speech Technology Group - Information Processing and Telecommunications Center (IPTC), Universidad Politécnica de Madrid, Madrid, Spain
Luis Fernando D'Haro
Department of Languages and Computer Systems, Universidad de Granada, CITIC-UGR, Granada, Spain
Zoraida Callejas
Information Science, Nara Institute of Science and Technology, Ikoma, Japan
Satoshi Nakamura

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Romero, D., Salamea, C. (2021). On the Use of Phonotactic Vector Representations with FastText for Language Identification. In: D'Haro, L.F., Callejas, Z., Nakamura, S. (eds) Conversational Dialogue Systems for the Next Decade. Lecture Notes in Electrical Engineering, vol 704. Springer, Singapore. https://doi.org/10.1007/978-981-15-8395-7_25

Download citation

DOI: https://doi.org/10.1007/978-981-15-8395-7_25
Published: 25 October 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8394-0
Online ISBN: 978-981-15-8395-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics