Abstract
High performance Mandarin digit recognition (MDR) is much more difficult to achieve than its English counterpart, especially on inexpensive hardware implementation. In this paper, a new Multi-Layer Perceptrons (MLP) based postprocessor, ana posteriori probability estimator, is presented and used for the rejection model of the speaker independent Mandarin digit recognition system based on hidden Markov model (HMM). Poor utterances, which are recognized by HMMs but have lowa posteriori probability, will be rejected. After rejecting about 4.9% of the tested utterances, the MLP rejection model can boost the digit recognition accuracy from 97.1% to 99.6%. The performance is better than those rejection models based on linear discrimination, likelihood ratio or anti-digit.
Similar content being viewed by others
References
Wilpon J G, Rabiner L R, Lee C H, Goldman E R. Automatic recognition of keywords in unconstrained speech using HMM’s.IEEE Trans. Acoustic, Speech, Signal Processing, 1990, ASSP-38(11): 1870–1878.
Sukkar R A, Wilpon J G. A two pass classifier for utterance rejection in keyword spotting. InProc. IEEE International Conference Acoustics, Speech, Signal Processing (ICASSP’93), 1993, Vol. 2, pp. 451–454.
Sukkar R A. Rejection for connected digit recognition based on GPD segmental discrimination. InProc. IEEE International Conference Acoustics, Speech, Signal Processing (ICASSP’94), 1994, Vol. 1, pp. 393–396.
Rahim M G, Lee C H, Juang B H. Discriminative utterance verification for connected digits recognition.IEEE Trans. Speech and Audio Processing, 1997, 5(3): 266–277.
Villarrubia L, Acero A. Rejection techniques for digit recognition in telecommunication applications. InProc. IEEE International Conference Acoustics, Speech, Signal Processing (ICASSP’93), 1993, Vol.2, pp. 455–458.
Mathan L, Miclet L. Rejection of extraneous input in speech recognition application using MLP’s and the trace of HMM’s. InProc. IEEE International Conference Acoustics, Speech, Signal Processing (ICASSP’91), 1991, Vol.1, pp. 93–96.
Richard M D, Lippmann R P. Neural network classifiers estimate Bayesiana posteriori probabilities.Neural Computation, 1991, 3: 461–483.
Gu L, Liu R S. Mandarin digit speech recognition: State of the art, difficult points analysis and methods comparison.Journal of Circuits and Systems, 1997, 2(4): 32–39, (in Chinese).
Loizou P C, Spanias A S. High performance alphabet recognition.IEEE Trans. Speech and Audio Processing, 1996, 4(6): 430–445.
Author information
Authors and Affiliations
Additional information
This project is supported by the National Natural Science Fundation of China (Grant No.69975007) and the National “863” High-Tech Programme of China (No.863-306-ZD13-04-6), Open Funds of National Laboratory of Pattern Recognition, and Intel Architecture Development Co., Ltd.
ZHONG Lin received his B.S. and M.S. degrees in circuit and system from Tsinghua University, Beijing, China, in 1998 and 2000, respectively. Now he is a Ph.D. candidate in the Electronic Engineering Department, Princeton University, US.
LIU Jia received his B.S., M.S., and Ph.D. degrees in communication and electronic systems from Tsinghua University, Beijing, China, in 1983, 1986 and 1990, respectively. In April 1990, he joined the Remote Sensing Satellite Ground Station, Chinese Academy of Sciences, and then he worked as a Royal Society visiting scientist at the Engineering Department, Cambridge University, UK during 1992–1994. He is now a professor in the Department of Electronic Engineering, Tsinghua University and an IEEE member. His current research focuses on speech recognition, speech synthesis, speech coding, speech ASIC design and multimedia communication.
LIU Runsheng graduated from the Department of Radio and Electronics, Tsinghua University, Beijing, China, in 1958. Since 1958, he has been working at Tsinghua University and he is now a professor in the Department of Electronic Engineering, Tsinghua University, where he teaches and conducts researches on digital and analog circuits, IC design, electronic circuit CAD, signal processing and digital communication.
Rights and permissions
About this article
Cite this article
Zhong, L., Liu, J. & Liu, R. A rejection model based on multi-layer perceptrons for Mandarin digit recognition. J. Comput. Sci. & Technol. 17, 196–202 (2002). https://doi.org/10.1007/BF02962212
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02962212