HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech

Zheng, Fang; Xu, Mingxing; Mou, Xiaolong; Wu, Jian; Wu, Wenhu; Fang, Ditang

doi:10.1007/BF02952483

HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech

Correspondence
Published: January 1999

Volume 14, pages 18–26, (1999)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zheng Fang¹,
Xu Mingxing¹,
Mou Xiaolong¹,
Wu Jian¹,
Wu Wenhu¹ &
…
Fang Ditang¹

21 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech. The speaking manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection. The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acoustic Similarity Scores for Keyword Spotting

Speech Keyword Spotting with Rule Based Segmentation

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

Article 27 October 2020

References

Rohlicek J R, Russel W, Roukos S, Gish H. Continuous hidden Markov modeling, for speaker-independent word spotting. InICASSP-89, 1989, 3: 627–630.
Furui S. Speaker-independent isolated word recognition using dynamic features of speech spectrum.IEEE Trans. ASSP, Feb. 1986, 34(1): 52–59.
Google Scholar
Zheng F, Chai H X, Shi Z J, Wu W H, Fang D T. A real-world speech recognition system based on CDCPms. In'97 Int. Conf. Computer Processing of Oriental Languages (ICCPOL'97), Apr. 2, 1997, Hong Kong, 1: 204–207.
Zheng Fang, Wu Wenhu, Fang Ditang. Center-distance continuous probability models and the distance measure.J. of Computer Science and Technology, 1998, 13 (5): 426–437.
Article MATH Google Scholar
Zheng F, Wu W H, Fang D T. Speech recognition units in the Chinese dictation machines. In4th National Conf. Man-Machine Speech Comm. (NCMMSC-96). Oct. 1996, pp.32–35, Beijing, P.R. China (in Chinese).
Zheng F. Studies on approaches of keyword spotting in unconstrained continuous speech. Ph.D. Dissertation, Beijing: Dept. of Comp. Sci. & Tech., Tsinghua Univ., June 1997, Taiwan.
Google Scholar
Zheng F, Xu M X, Wu W H. Descriptions of the intra-state feature space in speech recognition. In'97 Int. Conf. Research on Computational Linguistics, Aug. 1997, pp.272–276.
Higgins A L, Wohlford Robert E. Keyword Recognition Using Template Concatenation.ICASSP-85, 3: 1233–1236.
Lee C H, Rabiner L R. A frame synchronous network search algorithm for connected word recognition.IEEE Trans. ASSP, Nov. 1989, 37 (11): 1649–1658.
Article Google Scholar
Cox S J, Bridle J S. Unsupervised speaker adaptation by probabilistic spectrum fitting. InICASSP-89, 1989, 3: 294–297.
Erell A, Weintraub M. Spectral estimation for noise robust speech recognition. InDarpa Speech & Natural Language Workshop, Cape Cod, MA, 1989.
Gish H, Chow Y L, Rohlicek J R. Probabilistic vector mapping of noisy speech parameters for HMM word spotting. InICASSP-90, 1: 117–120.
Juang J, Rabiner L R. Signal restoration by spectral mapping. InICASSP-87, pp.2368–2371.
Nadas A, Nahamoo D, Picheny M. Speech recognition using noise-adaptive prototype.IEEE Trans. ASSP, 1989, 37 (10): 1495–1503.
Google Scholar
Ng K, Gish H, Rohlicek J R. Robust mapping of noisy speech parameters for HMM word spotting. InICASSP-92, 2: 109–112.
Rose R C, Paul D B. A hidden Markov model based keyword recognition system. InICASSP-90, 1: 129–132.
Takebayashi Y, Tsuboi H, Kanazawa H. A robust speech recognition system using word-spotting with noise immunity learning. InICASSP-91, pp.905–908.
Takebayashi Y, Tsuboi H, Kanazawa H. Keyword-spotting in noisy continuous speech using word pattern vector sub-abstraction and noise immunity learning. InICASSP-92, 2: 85–88.
Xu M X, Zheng F, Wu W H. Rejection in speech recognition based on CDCPMs. In'97 Int. Conf. Research on Computational Linguistics, Aug. 1997, pp.412–419.

Download references

Author information

Authors and Affiliations

Speech Laboratory, Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, P.R. China
Zheng Fang, Xu Mingxing, Mou Xiaolong, Wu Jian, Wu Wenhu & Fang Ditang

Authors

Zheng Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Mingxing
View author publications
You can also search for this author in PubMed Google Scholar
Mou Xiaolong
View author publications
You can also search for this author in PubMed Google Scholar
Wu Jian
View author publications
You can also search for this author in PubMed Google Scholar
Wu Wenhu
View author publications
You can also search for this author in PubMed Google Scholar
Fang Ditang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Fang.

Additional information

ZHENG Fang was born in 1967. He received his B.S., M.S. and Ph.D. degrees from Tsinghua University in 1990, 1992 and 1997 respectively. He currently is an Associate Professor of Tsinghua University, an Associate Director of the Department of Computer Science & Technology, the Director of the Speech Lab, and also the Director of the Analog Devices Inc.-Tsinghua DSP Technology Research Center. His research interests include acoustic/language modeling, isolated/continuous speech recognition, keyword spotting, dictating, language understanding and so on.

XU Mingxing was born in 1973. He received his B.S. degree from the Department of computer science and technology, Tsinghua University in computer science and technology in 1995. He currently is a Ph.D. candidate in computer application. His research interest includes speech recognition and language processing.

MOU Xiaolong was born in 1973. He received his B.S. degree from the Department of Computer Science and Technology, Tsinghua University in computer science and technology in 1996. He currently is an M.S. candidate in computer application. His research interest includes speech recognition and language processing.

WU Jian was born in 1975. He received his B.S. degree from the Department of Computer Science and Technology, Tsinghua University in computer science and technology in 1998. He currently is an M.S. candidate in computer application. His research interest includes speech recognition and language processing.

WU Wenhu was born in 1936. He studied in the Department of Electrical Engineering, Tsinghua University, from 1955 to 1958, and then in the Department of Automation, Tsinghua University, from 1958 to 1961. Since then, he has been teaching at Tsinghua University and now is a Full Professor in the Department of Computer Science and Technology and was its Associate Director from 1990 to 1997. He is devoted to the study on Chinese speech recognition and understanding, especially the speaker-independent Chinese speech recognition. He is the chairman of Computer Spread Education Commission of CCF (China Computer Federation). He led the China Team to take part in the IOI'89-IOI'95 (International Olympiad in Informatics) and winning many gold medals.

FANG Ditang was born in 1930. He received the B.S. degree from Shanghai Jiaotong University and the M.S. degree from Tsinghua University, both in electrical engineering, in 1953 and 1956, respectively. Since then, he has been teaching at Tsinghua University and now a Full Professor in the Department of Computer Science and Technology. In 1979, he founded the Laboratory for Human-Machine Speech Communications and was its director from 1979 to 1990. The laboratory received the National Scientific Research and Technology Progress Award, in 1987 and 1989, respectively, a National Scientific Invention Award in 1990. He is the Deputy Chief of the Artificial Intelligence and Pattern Recognition Committee of the Chinese Computer Science Society.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, F., Xu, M., Mou, X. et al. HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech. J. Comput. Sci. & Technol. 14, 18–26 (1999). https://doi.org/10.1007/BF02952483

Download citation

Received: 11 February 1998
Revised: 14 May 1998
Issue Date: January 1999
DOI: https://doi.org/10.1007/BF02952483

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech

Abstract

Access this article

Similar content being viewed by others

Acoustic Similarity Scores for Keyword Spotting

Speech Keyword Spotting with Rule Based Segmentation

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech

Abstract

Access this article

Similar content being viewed by others

Acoustic Similarity Scores for Keyword Spotting

Speech Keyword Spotting with Rule Based Segmentation

A Pitch and Noise Robust Keyword Spotting System Using SMAC Features with Prosody Modification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation