Skip to main content

Data-Driven Phone Selection for Language Identification via Bidirectional Long Short-Term Memory Modeling

  • Conference paper
  • First Online:
Computational Intelligence and Intelligent Systems (ISICA 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 873))

Included in the following conference series:

  • 647 Accesses

Abstract

In this paper, we propose a new phone selection method to select more suitable phones with higher score for language identification (LID), which is more similar to target language. A data-driven approach is developed for the phone selection to avoid using complex semantic knowledge which benefits from significant reduction in the manual cost of learning different languages. Recently, bidirectional long short-term memory (BLSTM) can provides more accurate content frame alignments with sequence information from longer duration, which has improved automatic speech recognition (ASR) performance. In principle, the output of BLSTM based ASR contains more candidates in form of phone lattice, which can reduces adverse effect of many practical factors, such as variations of channels, noises and accents. Therefore, initial phones sequences are extracted from phone lattice firstly which are generated by speech recognition results of BLSTM based ASR system. Second, asymmetrical distance between each phone and target language is proposed and then applied to weight the initial phones sequences. Accordingly, language-related phones are selected from the weighted phones. Finally, the selected phones are used to re-score input sentences for the LID system. Intensive experiments have been conducted on AP16-OLR Challenge to validate the effectiveness of our proposed method. It can be seen from results, these selected phones are more effective to LID than the rest phones. Our method gives improvement up to 39.96% in terms of Cavg compared with method without using phone selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.speechocean.com.

References

  1. Torres-Carrasquillo, P.A., Singer, E., Gleason, T., McCree, A., Reynolds, D.A., Richardson, F., Sturim, D.E.: The MITLL NIST LRE 2009 language recognition system. In: Acoustics Speech and Signal Processing (ICASSP) IEEE International Conference on 2010, pp. 4994–4997 (2010)

    Google Scholar 

  2. Gonzalez-Dominguez, J., Lopez-Moreno, I., Franco-Pedroso, J., Ramos, D., Toledano, D.T., Gonzalez-Rodriguez, J.: Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009. IEEE J. Sel. Top. Sig. Proc. 4(6), 1084–1093 (2010)

    Article  Google Scholar 

  3. Ferrer, L., Scheffer, N., Shriberg, E.: A comparison of approaches for modeling prosodic features in speaker recognition. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 4414–4417 (2010)

    Google Scholar 

  4. Martinez, D., Lleida, E., Ortega, A., Miguel, A.: Prosodic features and formant modeling for an ivectorbased language recognition system. In: Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference on 2013, pp. 6847–6851 (2013)

    Google Scholar 

  5. Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Interspeech ISCA, pp. 857–860 (2011)

    Google Scholar 

  6. Martinez, D., Plchot, O., Burget, L., Glembek, O., Matejka, P.: Language recognition in ivectors space. In: Interspeech ISCA, pp. 861–864 (2011)

    Google Scholar 

  7. Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., Moreno, P.: Automatic language identification using deep neural networks. In: Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference on 2014, pp. 5337–5341 (2014)

    Google Scholar 

  8. Gonzalez-Dominguez, J., Lopez-Moreno, I., Sak, H., Gonzalez-Rodriguez, J., Moreno, P.J.: Automatic language identification using long short-term memory recurrent neural networks. In: Interspeech, pp. 2155–2159 (2014)

    Google Scholar 

  9. Povey, D., Hannemann, M., Boulianne, G., Burget, L., Ghoshal, A., Janda, M., Karafiat, M., Kombrink, S., Motlicek, P., Qian, Y., et al.: Generating exact lattices in the WFST framework. In: Proceedings of ICASSP, pp. 4213–4216 (2012)

    Google Scholar 

  10. Irtza, S., Sethu, V., Fernando, S., Ambikairajah, E., Li, H.: Out of set language modelling in hierarchical language identification. In: Interspeech 2016, pp. 3270–3274 (2016)

    Google Scholar 

  11. Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: Phonetic unit selection for cross-lingual query-by-example spoken term detection. In: Automatic Speech Recognition and Understanding (ASRU) IEEE Workshop on 2015, pp. 223–229 (2015)

    Google Scholar 

  12. Wang, D., Li, L., Tang, D., Chen, Q.: AP16-OL7: a multilingual database for oriental languages and a language recognition baseline, submitted to APSIPA 2016.pdf

    Google Scholar 

  13. Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: International Conference on Acoustics, Speech, and Signal Processing (2013)

    Google Scholar 

  14. Sak, H., Saraclar, M., Güngör, T: On-the-fly lattice rescoring for real-time automatic speech recognition. In: Interspeech, pp. 2450–2453 (2010)

    Google Scholar 

  15. Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Comput. Speech Lang. 11, 43–72 (1997)

    Article  Google Scholar 

  16. Irtza, S., Sethu, V., Fernando, S., Ambikairajah, E., Li,H.: Out of set language modelling in hierarchical language identification. In: Interspeech 2016, pp. 3270–3274 (2016)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by Key Technologies Research & Development Program of Shenzhen (No: JSGG20150512160434776) and Key Technologies Research & Development of Data Retrieval and Monitoring via Multi-layer Network (No: JSGG20160229121006579).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuexian Zou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Song, X., Cheng, Q., Xing, J., Zou, Y. (2018). Data-Driven Phone Selection for Language Identification via Bidirectional Long Short-Term Memory Modeling. In: Li, K., Li, W., Chen, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2017. Communications in Computer and Information Science, vol 873. Springer, Singapore. https://doi.org/10.1007/978-981-13-1648-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1648-7_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1647-0

  • Online ISBN: 978-981-13-1648-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics