Skip to main content
Log in

A log-index weighted cepstral distance measure for speech recognition

  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

A log-index weighted cepstral distance measure is proposed and tested in speaker-independent and speaker-dependent isolated word recognition systems using statistic techniques. The weights for the cepstral coefficients of this measure equal the logarithm of the corresponding indices. The experimental results show that this kind of measure works better than any other weighted Euclidean cepstral distance measures on three speech databases. The error rate obtained using this measure is about 1.8 percent for three databases on average, which is a 25% reduction from that obtained using other measures, and a 40% reduction from that obtained using Log Likelihood Ratio (LLR) measure. The experimental results also show that this kind of distance measure works well in both speaker-dependent and speaker-independent speech recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Itakura F. Minimum prediction residual principle applied to speech recognition.IEEE Trans. Acoust., Speech, Signal. Processing, 1975, ASSP-23: 67–72.

    Article  Google Scholar 

  2. Nocerino N, Soong F K, Rabiner L R, Klatt D H. Comparative study of several distortion measures for speech recognition. InProc. ICASSP 1985, vol.11, Mar. 1985, pp.25–28.

  3. Furui S. Cepstral analysis technique for automatic speaker verification.IEEE Trans. Acoust., Speech, Signal Processing, 1981, ASSP-29: 254–272.

    Article  Google Scholar 

  4. Paliwal K K. On the performance of the quefrency-weighted cepstral coefficients in vowel recognition.Speech Commun., 1982, 1: 151–154.

    Article  Google Scholar 

  5. Tohkura Y. A weighted cepstral distance measure for speech recognition.IEEE Trans. Acoust., Speech, Signal Processing, 1987, ASSP-35(10): 1414–1422.

    Article  Google Scholar 

  6. Juang B H, Rabiner L R, Wilpon J G. On the use of bandpass liftering in speech recognition.IEEE Trans. Acoust., Speech, Signal Processing, 1987, ASSP-35(7): 947–953.

    Article  Google Scholar 

  7. Jiang Li, Wu Wenhu, Cai Lianhong, Fang Ditang. A real-time speaker-independent speech recognition system based on SPM for 208 Chinese words. InProc. ICSP’90, pp.473–476, 1990.

  8. Zheng Fang, Yang Hongbo, Wu Wenhu, Fang Ditang. A continuous distance density segmental probabilistic model. InProc. National Conference on Man-Machine Speech Communication (NCMMSC’94), Speech Recognition and Synthesis, pp.238–241, Oct. 1994. (in Chinese)

  9. Zheng Fang, Wu Wenhu, Fang Ditang. The CDCPM with applications to speech recognition. Accepted byChinese J. Advanced Software Research, 1996. (in Chinese)

  10. Juang B H, Rabiner L R, Wilpon J G. On the use of bandpass liftering in speech recognition.IEEE Trans. ASSP, 1987, ASSP-35: 947–953.

    Article  Google Scholar 

  11. Makhoul J. Linear prediction: A tutorial review. InProc. IEEE, Apr. 1975, vol.63, pp.562–580.

  12. Gold B, Rader C M. Digital Processing of Signals. New York, McGraw-Hill, 1969, p.246.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Fang.

Additional information

Zheng Fang was born in Jiangsu Province, P.R.China, in 1967. He received the B.S. degree and the M.S. degree from Tsinghua Univ., P.R. China, both in computer science and technology, in 1990 and 1992, respectively. He is now a lecturer and, at the same time, a Ph.D. candidate in Tsinghua University. He is also the Executive Director of the Analog Devices Inc.-Tsinghua DSP Technology Research Center. Since 1988, He has been working on Speech Recognition at Speech Lab., Dept. of Computer Science and Technology, Tsinghua University.

Wu Wenhu was born in Beijing, P.R.China, in 1936. He studied in the Department of Electrical Engineering, Tsinghua University from 1955 to 1958, and then in the Department of Automation, Tsinghua University, from 1958 to 1961. Since then he has been at Tsinghua University and now a Professor in the Department of Computer Science and Technology. He is the Director of the Speech Lab. now. He is devoted in the research of Chinese speech recognition and understanding, especially the speaker-independent Chinese speech recognition. As a result, he has been awarded several times. He is also engaged in the computer spread education. He is the Chairman of Computer Spread Education Commission of CCF (Chinese Computer Federation). He led the China Team to take part in the IOI’89—IOI’95 (International Olympiad in Informatics) and won many gold medals.

Fang Ditang was born in Shanghai, P.R.China, in 1930. He received the B.S. degree from Jiaotong University and the M.S. degree from Tsinghua University, both in electrical engineering, in 1953 and 1956, respectively. Since then, he has been teaching at Tsinghua University and now a Professor in the Department of Computer Science and Technology. In 1979, he founded the Laboratory for Human-Machine Speech Communications and was the Director from 1979 to 1990. The laboratory won the National Scientific Research and Technology Progress Award twice, in 1987 and 1989, respectively, the National Scientific Invention Award in 1990, and three other awards. He is the Deputy Chief of the Artificial Intelligence and Pattern Recognition Committee of the Chinese Computer Science Society.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, F., Wu, W. & Fang, D. A log-index weighted cepstral distance measure for speech recognition. J. of Comput. Sci. & Technol. 12, 177–184 (1997). https://doi.org/10.1007/BF02951337

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02951337

Keywords

Navigation