Skip to main content
Log in

HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech

  • Correspondence
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In this paper, a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech. The speaking manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection. The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Rohlicek J R, Russel W, Roukos S, Gish H. Continuous hidden Markov modeling, for speaker-independent word spotting. InICASSP-89, 1989, 3: 627–630.

  2. Furui S. Speaker-independent isolated word recognition using dynamic features of speech spectrum.IEEE Trans. ASSP, Feb. 1986, 34(1): 52–59.

    Google Scholar 

  3. Zheng F, Chai H X, Shi Z J, Wu W H, Fang D T. A real-world speech recognition system based on CDCPms. In'97 Int. Conf. Computer Processing of Oriental Languages (ICCPOL'97), Apr. 2, 1997, Hong Kong, 1: 204–207.

  4. Zheng Fang, Wu Wenhu, Fang Ditang. Center-distance continuous probability models and the distance measure.J. of Computer Science and Technology, 1998, 13 (5): 426–437.

    Article  MATH  Google Scholar 

  5. Zheng F, Wu W H, Fang D T. Speech recognition units in the Chinese dictation machines. In4th National Conf. Man-Machine Speech Comm. (NCMMSC-96). Oct. 1996, pp.32–35, Beijing, P.R. China (in Chinese).

  6. Zheng F. Studies on approaches of keyword spotting in unconstrained continuous speech. Ph.D. Dissertation, Beijing: Dept. of Comp. Sci. & Tech., Tsinghua Univ., June 1997, Taiwan.

    Google Scholar 

  7. Zheng F, Xu M X, Wu W H. Descriptions of the intra-state feature space in speech recognition. In'97 Int. Conf. Research on Computational Linguistics, Aug. 1997, pp.272–276.

  8. Higgins A L, Wohlford Robert E. Keyword Recognition Using Template Concatenation.ICASSP-85, 3: 1233–1236.

  9. Lee C H, Rabiner L R. A frame synchronous network search algorithm for connected word recognition.IEEE Trans. ASSP, Nov. 1989, 37 (11): 1649–1658.

    Article  Google Scholar 

  10. Cox S J, Bridle J S. Unsupervised speaker adaptation by probabilistic spectrum fitting. InICASSP-89, 1989, 3: 294–297.

  11. Erell A, Weintraub M. Spectral estimation for noise robust speech recognition. InDarpa Speech & Natural Language Workshop, Cape Cod, MA, 1989.

  12. Gish H, Chow Y L, Rohlicek J R. Probabilistic vector mapping of noisy speech parameters for HMM word spotting. InICASSP-90, 1: 117–120.

  13. Juang J, Rabiner L R. Signal restoration by spectral mapping. InICASSP-87, pp.2368–2371.

  14. Nadas A, Nahamoo D, Picheny M. Speech recognition using noise-adaptive prototype.IEEE Trans. ASSP, 1989, 37 (10): 1495–1503.

    Google Scholar 

  15. Ng K, Gish H, Rohlicek J R. Robust mapping of noisy speech parameters for HMM word spotting. InICASSP-92, 2: 109–112.

  16. Rose R C, Paul D B. A hidden Markov model based keyword recognition system. InICASSP-90, 1: 129–132.

  17. Takebayashi Y, Tsuboi H, Kanazawa H. A robust speech recognition system using word-spotting with noise immunity learning. InICASSP-91, pp.905–908.

  18. Takebayashi Y, Tsuboi H, Kanazawa H. Keyword-spotting in noisy continuous speech using word pattern vector sub-abstraction and noise immunity learning. InICASSP-92, 2: 85–88.

  19. Xu M X, Zheng F, Wu W H. Rejection in speech recognition based on CDCPMs. In'97 Int. Conf. Research on Computational Linguistics, Aug. 1997, pp.412–419.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Fang.

Additional information

ZHENG Fang was born in 1967. He received his B.S., M.S. and Ph.D. degrees from Tsinghua University in 1990, 1992 and 1997 respectively. He currently is an Associate Professor of Tsinghua University, an Associate Director of the Department of Computer Science & Technology, the Director of the Speech Lab, and also the Director of the Analog Devices Inc.-Tsinghua DSP Technology Research Center. His research interests include acoustic/language modeling, isolated/continuous speech recognition, keyword spotting, dictating, language understanding and so on.

XU Mingxing was born in 1973. He received his B.S. degree from the Department of computer science and technology, Tsinghua University in computer science and technology in 1995. He currently is a Ph.D. candidate in computer application. His research interest includes speech recognition and language processing.

MOU Xiaolong was born in 1973. He received his B.S. degree from the Department of Computer Science and Technology, Tsinghua University in computer science and technology in 1996. He currently is an M.S. candidate in computer application. His research interest includes speech recognition and language processing.

WU Jian was born in 1975. He received his B.S. degree from the Department of Computer Science and Technology, Tsinghua University in computer science and technology in 1998. He currently is an M.S. candidate in computer application. His research interest includes speech recognition and language processing.

WU Wenhu was born in 1936. He studied in the Department of Electrical Engineering, Tsinghua University, from 1955 to 1958, and then in the Department of Automation, Tsinghua University, from 1958 to 1961. Since then, he has been teaching at Tsinghua University and now is a Full Professor in the Department of Computer Science and Technology and was its Associate Director from 1990 to 1997. He is devoted to the study on Chinese speech recognition and understanding, especially the speaker-independent Chinese speech recognition. He is the chairman of Computer Spread Education Commission of CCF (China Computer Federation). He led the China Team to take part in the IOI'89-IOI'95 (International Olympiad in Informatics) and winning many gold medals.

FANG Ditang was born in 1930. He received the B.S. degree from Shanghai Jiaotong University and the M.S. degree from Tsinghua University, both in electrical engineering, in 1953 and 1956, respectively. Since then, he has been teaching at Tsinghua University and now a Full Professor in the Department of Computer Science and Technology. In 1979, he founded the Laboratory for Human-Machine Speech Communications and was its director from 1979 to 1990. The laboratory received the National Scientific Research and Technology Progress Award, in 1987 and 1989, respectively, a National Scientific Invention Award in 1990. He is the Deputy Chief of the Artificial Intelligence and Pattern Recognition Committee of the Chinese Computer Science Society.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, F., Xu, M., Mou, X. et al. HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech. J. Comput. Sci. & Technol. 14, 18–26 (1999). https://doi.org/10.1007/BF02952483

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02952483

Keywords

Navigation