Skip to main content
Log in

Language-Dependent Contribution Measuring and Weighting for Combining Likelihood Scores in Language Identification Systems

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Developing a fusion-based system is one of the key research issues in modern Language Identification (LID) systems. In this paper we investigate existing fusion techniques for LID systems and propose an alternative solution. By directly utilizing language-dependent contribution information, a novel Language-Dependent Weighting approach is introduced and implemented. We investigate various contribution measures, including LID performances, likelihood ratios, and Kullback–Leibler divergence. These measures are conducted from either development datasets or class models. The advantage of using language-dependent weighting over language-independent weighting is illustrated using a Language-Dependent Contribution Map. Both the OGI and CallFriend databases show a very similar contribution pattern which is related to language characteristics. Experiments on the NIST LRE 2003 task and OGI database demonstrate that the proposed fusion technique outperforms other recent fusion techniques when the amount of available development data is limited. In particular, the system based on Kullback-Leibler divergence achieved the best performance while eliminating the need for development data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7

Similar content being viewed by others

References

  1. Greenberg, S., & Arai, T. (2004). What are the essential cues for understanding spoken languages. IEICE Transaction on Information & System, E87-D, 1059.

    Google Scholar 

  2. Yin, B., Ambikairajah, E., & Chen, F. (2006). Combining prosodic and cepstral features in language identification. IEEE international conference on pattern recognition, Hong Kong, China.

  3. Singer, E., Torres-Carrasquillo, P. A., Gleason, T. P., Campbell, W. M., D. A. Reynolds (2003). Acoustic, Phonetic, and Discriminative approaches to automatic language identification. EuroSpeech, Geneva, Switzerland.

  4. Wong, E., & Sridharan, S. (2001). Fusion of output scores on language identification system. Workshop on Multilingual Speech and Language Processing, Aalborg Denmark.

  5. Rong, T., Bin, M., Donglai, Z., Haizhou, L., & Eng Siong, C. (2006). Integrating acoustic, prosodic and phonotactic features for spoken language identification. IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France.

  6. Gutierrez, J., Rouas, J. L., & Andre-Obrecht, R. (2004). Fusing language identification systems using performance confidence indexes. IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal Canada.

  7. Shafran, I. (2007). Multi-stream fusion for speaker classification. In Speaker Classification I, pp. 298–312.

  8. Snelick, R., Uludag, U., Mink, A., Indovina, M., & Jain, A. (2005). Large scale evaluation of multimodal biometric authentication using state-of-the-art systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 450–455.

    Article  Google Scholar 

  9. Schölkopf, B., Burges, C. J. C., & Smola, A. J. (1999). Advances in kernel methods: Support vector learning. Cambridge: MIT Press.

    Google Scholar 

  10. Campbell, W., Gleason, T., Navratil, J.,Reynolds, D., Shen, W., Singer, E., & Torres-Carrasquillo, P. (2006). Advanced language recognition using cepstra and phonotactics: MITLL system performance on the NIST 2005 language recognition evaluation. IEEE Odyssey—The Speaker and Language Recognition Workshop.

  11. Gauvain, J. L., Messaoudi, A., & Schwenk, H. (2004). Language recognition using phone lattices. ICSLP, Jeju island.

  12. Milner, B. (2002). A comparison of front-end configurations for robust speech recognition. Acoustics, Speech, and Signal Processing. IEEE International Conference on (ICASSP).

  13. Openshaw, J. P., Sun, Z. P., & Mason, J. S. (1993). A comparison of composite features under degraded speech in speaker recognition, Acoustics, Speech, and Signal Processing. IEEE International Conference on (ICASSP).

  14. Chi-Yueh Lin, H.-C. W. (2005). Language identification using pitch contour information, ICASSP.

  15. Yasunari Obuchi, N. S. (2005). Language identification using phonetic and prosodic HMMs with feature normalization. ICASSP.

  16. Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22, 403–417.

    Article  Google Scholar 

  17. Hegde, R. M., Murthy, H. A., & Rao, G. V. R. (2004). Application of the modified group delay function to speaker identification and discrimination, Acoustics, Speech, and Signal Processing. 2004. Proceedings. (ICASSP ‘04). IEEE International Conference on.

  18. Alsteris, L. D., & Paliwal, K. K. (2005). Evaluation of the modified group delay feature for isolated word recognition. ISSAP.

  19. Thiruvaran, T., Ambikairajah, E., & Epps, J. (2008). Extraction of FM components from speech signals using all-pole model. Electronics Letters, 44, 449–450.

    Article  Google Scholar 

  20. Yin, B., Ambikairajah, E., & Chen, F. (2007). Hierarchical language identification based on automatic language clustering. InterSpeech–EuroSpeech, Antwerp, Belgium.

  21. Stadelmann, T., & Freisleben, B. (2006). Fast and robust speaker clustering using the Earth Mover’s distance and mixmax models, acoustics, speech and signal processing, 2006. ICASSP 2006, Proceedings. 2006 IEEE International Conference on.

  22. Beigi, H. S. M., Maes, S. H., & Sorensen, J. S. (1998). A distance measure between collections of distributions and its application to speaker recognition, acoustics, speech and signal processing, 1998. Proceedings of the 1998 IEEE International Conference on.

  23. Allen, F., Ambikairajah, E., & Epps, J. (2005). Language identification using warping and the shifted delta cepstrum. IEEE International Workshop on Multimedia Signal Processing, Shanghai, China.

  24. NIST Language Recognition Evaluation. (2003). http://www.itl.nist.gov/iad/894.01/tests/lang/2003/index.htm

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yin, B., Ambikairajah, E. & Chen, F. Language-Dependent Contribution Measuring and Weighting for Combining Likelihood Scores in Language Identification Systems. J Sign Process Syst Sign Image Video Technol 59, 201–210 (2010). https://doi.org/10.1007/s11265-008-0291-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0291-6

Keywords

Navigation