Data-Driven Phone Selection for Language Identification via Bidirectional Long Short-Term Memory Modeling

Song, Xiao; Cheng, Qiang; Xing, Jingping; Zou, Yuexian

doi:10.1007/978-981-13-1648-7_26

Xiao Song^12,13,
Qiang Cheng¹⁴,
Jingping Xing¹⁵ &
…
Yuexian Zou¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 873))

Included in the following conference series:

International Symposium on Intelligence Computation and Applications

647 Accesses

Abstract

In this paper, we propose a new phone selection method to select more suitable phones with higher score for language identification (LID), which is more similar to target language. A data-driven approach is developed for the phone selection to avoid using complex semantic knowledge which benefits from significant reduction in the manual cost of learning different languages. Recently, bidirectional long short-term memory (BLSTM) can provides more accurate content frame alignments with sequence information from longer duration, which has improved automatic speech recognition (ASR) performance. In principle, the output of BLSTM based ASR contains more candidates in form of phone lattice, which can reduces adverse effect of many practical factors, such as variations of channels, noises and accents. Therefore, initial phones sequences are extracted from phone lattice firstly which are generated by speech recognition results of BLSTM based ASR system. Second, asymmetrical distance between each phone and target language is proposed and then applied to weight the initial phones sequences. Accordingly, language-related phones are selected from the weighted phones. Finally, the selected phones are used to re-score input sentences for the LID system. Intensive experiments have been conducted on AP16-OLR Challenge to validate the effectiveness of our proposed method. It can be seen from results, these selected phones are more effective to LID than the rest phones. Our method gives improvement up to 39.96% in terms of C_avg compared with method without using phone selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.speechocean.com.

References

Torres-Carrasquillo, P.A., Singer, E., Gleason, T., McCree, A., Reynolds, D.A., Richardson, F., Sturim, D.E.: The MITLL NIST LRE 2009 language recognition system. In: Acoustics Speech and Signal Processing (ICASSP) IEEE International Conference on 2010, pp. 4994–4997 (2010)
Google Scholar
Gonzalez-Dominguez, J., Lopez-Moreno, I., Franco-Pedroso, J., Ramos, D., Toledano, D.T., Gonzalez-Rodriguez, J.: Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009. IEEE J. Sel. Top. Sig. Proc. 4(6), 1084–1093 (2010)
Article Google Scholar
Ferrer, L., Scheffer, N., Shriberg, E.: A comparison of approaches for modeling prosodic features in speaker recognition. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 4414–4417 (2010)
Google Scholar
Martinez, D., Lleida, E., Ortega, A., Miguel, A.: Prosodic features and formant modeling for an ivectorbased language recognition system. In: Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference on 2013, pp. 6847–6851 (2013)
Google Scholar
Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Interspeech ISCA, pp. 857–860 (2011)
Google Scholar
Martinez, D., Plchot, O., Burget, L., Glembek, O., Matejka, P.: Language recognition in ivectors space. In: Interspeech ISCA, pp. 861–864 (2011)
Google Scholar
Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., Moreno, P.: Automatic language identification using deep neural networks. In: Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference on 2014, pp. 5337–5341 (2014)
Google Scholar
Gonzalez-Dominguez, J., Lopez-Moreno, I., Sak, H., Gonzalez-Rodriguez, J., Moreno, P.J.: Automatic language identification using long short-term memory recurrent neural networks. In: Interspeech, pp. 2155–2159 (2014)
Google Scholar
Povey, D., Hannemann, M., Boulianne, G., Burget, L., Ghoshal, A., Janda, M., Karafiat, M., Kombrink, S., Motlicek, P., Qian, Y., et al.: Generating exact lattices in the WFST framework. In: Proceedings of ICASSP, pp. 4213–4216 (2012)
Google Scholar
Irtza, S., Sethu, V., Fernando, S., Ambikairajah, E., Li, H.: Out of set language modelling in hierarchical language identification. In: Interspeech 2016, pp. 3270–3274 (2016)
Google Scholar
Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: Phonetic unit selection for cross-lingual query-by-example spoken term detection. In: Automatic Speech Recognition and Understanding (ASRU) IEEE Workshop on 2015, pp. 223–229 (2015)
Google Scholar
Wang, D., Li, L., Tang, D., Chen, Q.: AP16-OL7: a multilingual database for oriental languages and a language recognition baseline, submitted to APSIPA 2016.pdf
Google Scholar
Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: International Conference on Acoustics, Speech, and Signal Processing (2013)
Google Scholar
Sak, H., Saraclar, M., Güngör, T: On-the-fly lattice rescoring for real-time automatic speech recognition. In: Interspeech, pp. 2450–2453 (2010)
Google Scholar
Ortmanns, S., Ney, H., Aubert, X.: A word graph algorithm for large vocabulary continuous speech recognition. Comput. Speech Lang. 11, 43–72 (1997)
Article Google Scholar
Irtza, S., Sethu, V., Fernando, S., Ambikairajah, E., Li,H.: Out of set language modelling in hierarchical language identification. In: Interspeech 2016, pp. 3270–3274 (2016)
Google Scholar

Download references

Acknowledgements

This work is partially supported by Key Technologies Research & Development Program of Shenzhen (No: JSGG20150512160434776) and Key Technologies Research & Development of Data Retrieval and Monitoring via Multi-layer Network (No: JSGG20160229121006579).

Author information

Authors and Affiliations

ADSPLAB/Intelligent Lab, SECE, Peking University, Shenzhen, China
Xiao Song & Yuexian Zou
PKU Shenzhen Institute, Shenzhen, China
Xiao Song
Shenzhen Press Group, Shenzhen, China
Qiang Cheng
Shenzhen Securities Information Co., Ltd, Shenzhen, China
Jingping Xing

Authors

Xiao Song
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Jingping Xing
View author publications
You can also search for this author in PubMed Google Scholar
Yuexian Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuexian Zou .

Editor information

Editors and Affiliations

College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China
Kangshun Li
Jiangxi University of Science and Technology, Ganzhou, Jiangxi, China
Wei Li
Chemical and Petroleum Engineering, University of Calgary, Calgary, Alberta, Canada
Zhangxing Chen
School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu, Fukushima, Japan
Yong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Cheng, Q., Xing, J., Zou, Y. (2018). Data-Driven Phone Selection for Language Identification via Bidirectional Long Short-Term Memory Modeling. In: Li, K., Li, W., Chen, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2017. Communications in Computer and Information Science, vol 873. Springer, Singapore. https://doi.org/10.1007/978-981-13-1648-7_26

Download citation

DOI: https://doi.org/10.1007/978-981-13-1648-7_26
Published: 21 July 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1647-0
Online ISBN: 978-981-13-1648-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics