Abstract
Developing a fusion-based system is one of the key research issues in modern Language Identification (LID) systems. In this paper we investigate existing fusion techniques for LID systems and propose an alternative solution. By directly utilizing language-dependent contribution information, a novel Language-Dependent Weighting approach is introduced and implemented. We investigate various contribution measures, including LID performances, likelihood ratios, and Kullback–Leibler divergence. These measures are conducted from either development datasets or class models. The advantage of using language-dependent weighting over language-independent weighting is illustrated using a Language-Dependent Contribution Map. Both the OGI and CallFriend databases show a very similar contribution pattern which is related to language characteristics. Experiments on the NIST LRE 2003 task and OGI database demonstrate that the proposed fusion technique outperforms other recent fusion techniques when the amount of available development data is limited. In particular, the system based on Kullback-Leibler divergence achieved the best performance while eliminating the need for development data.
Similar content being viewed by others
References
Greenberg, S., & Arai, T. (2004). What are the essential cues for understanding spoken languages. IEICE Transaction on Information & System, E87-D, 1059.
Yin, B., Ambikairajah, E., & Chen, F. (2006). Combining prosodic and cepstral features in language identification. IEEE international conference on pattern recognition, Hong Kong, China.
Singer, E., Torres-Carrasquillo, P. A., Gleason, T. P., Campbell, W. M., D. A. Reynolds (2003). Acoustic, Phonetic, and Discriminative approaches to automatic language identification. EuroSpeech, Geneva, Switzerland.
Wong, E., & Sridharan, S. (2001). Fusion of output scores on language identification system. Workshop on Multilingual Speech and Language Processing, Aalborg Denmark.
Rong, T., Bin, M., Donglai, Z., Haizhou, L., & Eng Siong, C. (2006). Integrating acoustic, prosodic and phonotactic features for spoken language identification. IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France.
Gutierrez, J., Rouas, J. L., & Andre-Obrecht, R. (2004). Fusing language identification systems using performance confidence indexes. IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal Canada.
Shafran, I. (2007). Multi-stream fusion for speaker classification. In Speaker Classification I, pp. 298–312.
Snelick, R., Uludag, U., Mink, A., Indovina, M., & Jain, A. (2005). Large scale evaluation of multimodal biometric authentication using state-of-the-art systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 450–455.
Schölkopf, B., Burges, C. J. C., & Smola, A. J. (1999). Advances in kernel methods: Support vector learning. Cambridge: MIT Press.
Campbell, W., Gleason, T., Navratil, J.,Reynolds, D., Shen, W., Singer, E., & Torres-Carrasquillo, P. (2006). Advanced language recognition using cepstra and phonotactics: MITLL system performance on the NIST 2005 language recognition evaluation. IEEE Odyssey—The Speaker and Language Recognition Workshop.
Gauvain, J. L., Messaoudi, A., & Schwenk, H. (2004). Language recognition using phone lattices. ICSLP, Jeju island.
Milner, B. (2002). A comparison of front-end configurations for robust speech recognition. Acoustics, Speech, and Signal Processing. IEEE International Conference on (ICASSP).
Openshaw, J. P., Sun, Z. P., & Mason, J. S. (1993). A comparison of composite features under degraded speech in speaker recognition, Acoustics, Speech, and Signal Processing. IEEE International Conference on (ICASSP).
Chi-Yueh Lin, H.-C. W. (2005). Language identification using pitch contour information, ICASSP.
Yasunari Obuchi, N. S. (2005). Language identification using phonetic and prosodic HMMs with feature normalization. ICASSP.
Liu, L., He, J., & Palm, G. (1997). Effects of phase on the perception of intervocalic stop consonants. Speech Communication, 22, 403–417.
Hegde, R. M., Murthy, H. A., & Rao, G. V. R. (2004). Application of the modified group delay function to speaker identification and discrimination, Acoustics, Speech, and Signal Processing. 2004. Proceedings. (ICASSP ‘04). IEEE International Conference on.
Alsteris, L. D., & Paliwal, K. K. (2005). Evaluation of the modified group delay feature for isolated word recognition. ISSAP.
Thiruvaran, T., Ambikairajah, E., & Epps, J. (2008). Extraction of FM components from speech signals using all-pole model. Electronics Letters, 44, 449–450.
Yin, B., Ambikairajah, E., & Chen, F. (2007). Hierarchical language identification based on automatic language clustering. InterSpeech–EuroSpeech, Antwerp, Belgium.
Stadelmann, T., & Freisleben, B. (2006). Fast and robust speaker clustering using the Earth Mover’s distance and mixmax models, acoustics, speech and signal processing, 2006. ICASSP 2006, Proceedings. 2006 IEEE International Conference on.
Beigi, H. S. M., Maes, S. H., & Sorensen, J. S. (1998). A distance measure between collections of distributions and its application to speaker recognition, acoustics, speech and signal processing, 1998. Proceedings of the 1998 IEEE International Conference on.
Allen, F., Ambikairajah, E., & Epps, J. (2005). Language identification using warping and the shifted delta cepstrum. IEEE International Workshop on Multimedia Signal Processing, Shanghai, China.
NIST Language Recognition Evaluation. (2003). http://www.itl.nist.gov/iad/894.01/tests/lang/2003/index.htm
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yin, B., Ambikairajah, E. & Chen, F. Language-Dependent Contribution Measuring and Weighting for Combining Likelihood Scores in Language Identification Systems. J Sign Process Syst Sign Image Video Technol 59, 201–210 (2010). https://doi.org/10.1007/s11265-008-0291-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0291-6