Skip to main content

On the Use of Phonotactic Vector Representations with FastText for Language Identification

  • Chapter
  • First Online:
Conversational Dialogue Systems for the Next Decade

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 704))

Abstract

This paper explores a better way to learn word vector representations for language identification (LID). We have focused on a phonotactic approach using phoneme sequences in order to make phonotactic units (phone-grams) to incorporate context information. In order to take into consideration the morphology of phone-grams, we have considered the use of sub-word information (lower-order n-grams) to learn phone-grams embeddings using FastText. These embeddings are used as input to an i-Vector framework to train a multiclass logistic classifier. Our approach has been compared with a LID system that uses phone-gram embeddings learned through Skipgram that do not implement sub-word information, using Cavg as a metric for our experiments. Our approach to LID to incorporate sub-word information in phone-grams embeddings significantly improves the results obtained by using embeddings that are learned ignoring the structure of phone-grams. Furthermore, we have shown that our system provides complementary information to an acoustic system, improving it through the fusion of both systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ace P, Schwarz P, Ace V (2009) Phoneme recognition based on long temporal context

    Google Scholar 

  2. Barbaresi A (2017) Discriminating between similar languages using weighted subword features. In: Fourth workshop on NLP for similar languages, pp 184–189

    Google Scholar 

  3. Berkling K, Arai T, Barnard E (1994) Analysis of phoneme-based features for language identification. In: Proceedings of the international conference on acoustics, speech and signal processing. IEEE, pp 289–292

    Google Scholar 

  4. Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. arXiv:1607.04606v2

  5. Chaudhary A, Zhou C, Levin L, Neubig G, Mortensen D, Carbonell J (2018) Adapting word embeddings to new languages with morphological and phonological subword representations. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 3285–3295

    Google Scholar 

  6. D’Haro L, Glembek O, Plchot O, Matejka P, Soufifar M, Córdoba R, Cernocky J (2012) Phonotactic language recognition using i-vectors and phoneme posteriogram counts. In: ISCA 13th annual conference, Proceedings of the INTERSPEECH, pp 42–45

    Google Scholar 

  7. Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image description. IEEE Trans Pattern Anal Mach Intell 39(4):664–676

    Article  Google Scholar 

  8. Kulmizev A, Blankers B, Bjerva J, Nissim M, Noord G, Plank B, Wieling M (2017) The power of character n-grams in native language identification. In: Proceedings of the 12th workshop on innovative use of NLP for building educational applications, pp 382–389

    Google Scholar 

  9. Livescu K, Fosler-Lussier E, Metze F (2012) Sub-word modeling for automatic speech recognition. IEEE Signal Process Mag 29:44–57

    Article  Google Scholar 

  10. Matejka P, Schwarz P, Cernock J, Chytil P (2005) Phonotactic language identification using high quality phonome recognition. In: Proceedings of the IberSPEECH, pp 2237–2240

    Google Scholar 

  11. Martin A, Greenberg C (2010) The 2009 NIST language recognition evaluation. In: Odyssey, p 30

    Google Scholar 

  12. Mager M, Cetinoglu O, Kann K (2019) Subword-level language identification for intra-word code-switching. arXiv:1904.01989v1

  13. Mikolov T, Sutskever I, Deoras A, Le H, Kombrink S, Cernocky J (2011) Subword language modeling with neural networks

    Google Scholar 

  14. Palaskar S, Raunak V, Metze F (2019) Learned in speech recognition: contextual acoustic word embeddings. arXiv:1902.06833v1

  15. Qi Y, Sachan D, Felix M, Padmanabhan S, Neubig G (2018) When and why are pre-trained word embeddings useful for neural machine translation? arXiv:1804.06323v2

  16. Rodriguez L, Penagarikano M, Varona A, Diez M, Bordel G (2016) KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios. Lang Resour Eval 50(2):221–243

    Article  Google Scholar 

  17. Salamea C, Córdoba R, D’Haro L, Segundo R, Ferreiros J (2018) On the use of phone-based embeddings for language recognition. In: Proceedings of the IberSPEECH, pp 55–59

    Google Scholar 

  18. Singh R, Raj B, Stern R (2002) Automatic generation of subword units for speech recognition systems. IEEE Trans Speech Audio Process 10(2):89–99

    Article  Google Scholar 

  19. Xia M (2016) Codeswitching language identification using subword information enriched word vectors. In: Proceedings of the second workshop on computational approaches to code switching, pp 132–136

    Google Scholar 

  20. Zhang Z, Huang Y, Zhu P, Zhao H (2018) Effective character-augmented word embedding for machine reading comprehension. arXiv:1808.02772v1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Romero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Romero, D., Salamea, C. (2021). On the Use of Phonotactic Vector Representations with FastText for Language Identification. In: D'Haro, L.F., Callejas, Z., Nakamura, S. (eds) Conversational Dialogue Systems for the Next Decade. Lecture Notes in Electrical Engineering, vol 704. Springer, Singapore. https://doi.org/10.1007/978-981-15-8395-7_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-8395-7_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-8394-0

  • Online ISBN: 978-981-15-8395-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics