Abstract
In this paper, we introduce a non-voice rejection method to perform Voice/Non-Voice (V/NV) classification using a fundamental frequency (F0) estimator called YIN. Although current speech recognition technology has achieved high performance, it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. The V/NV classification algorithm, which rejects non-voice input in Voice Activity Detection (VAD), is helpful for realizing a highly reliable system. The proposed V/NV classification adopts the ratio of a reliable F 0 contour to the whole input interval. To evaluate the performance of our proposed method, we used 1567 voice commands and 447 noises in powered wheelchair control in a real environment. These results indicate that the recall rate is 97% when the lowest threshold is selected for noise classification with 99% precision in VAD.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lee, S.W., Tanaka, K., Itoh, Y.: Combining Multiple Subword Representations for Open-Vocabulary Spoken Document Retrieval. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 505–508 (2005)
Sadohara, K., Lee, S.W., Kojima, H.: Topic Segmentation Using Kernel Principal Component Analysis for Sub-Phonetic Segments. Technical Report of IEICE, AI2004-77, pp. 37–41 (2005)
Suk, S.Y., Lee, S.W., Kojima, H., Makino, S.: Multi-mixture based PDT-SSS Algorithm for Extension of HM-Net Structure. In: Proc. 2005 September Meeting of the Acoustical Society of Japan (2005)
Sasou, A., Asano, F., Tanaka, K., Nakamura, S.: HMM-Based Feature Compensation Method: An Evaluation Using the AURORA2. In: Proc. Int. Conf. Spoken Language Processing, pp. 121–124 (2004)
Jonson, D.H., Dudgeon, D.E.: Array signal processing. Prentice Hall, Englewood Cliffs (1993)
Sasou, A., Kojima, H.: Multi-channel speech input system for a wheelchair. In: Proc. 2006 Mar. Meeting of the Acoustical Society of Japan (2006)
Rouat, J., Liu, Y.C., Morrisette, D.: A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Communication 21 (1997)
Ahmadi, S., Andreas, S.S.: Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE Trans. Speech Audio Processing 7(3), 333–339 (1999)
Mousset, E., Ainsworth, W.A., Fonollosa, J.A.R.: A comparison of several recent methods of fundamental frequency and voicing decision estimation. In: Proc. Int. Conf. Spoken Language Processing, vol. 2, pp. 1273–1276 (1996)
Lee, A., Kawahara, T., Shikano, K.: Julius — an open source real-time large vocabulary recognition engine. In: Proc. European Conference on Speech Communication and Technology, pp. 1691–1694 (2001)
de Cheveigne, A., Kawahara, H.: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustic Society of the America 111 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Suk, SY., Chung, HY., Kojima, H. (2007). Voice/Non-Voice Classification Using Reliable Fundamental Frequency Estimator for Voice Activated Powered Wheelchair Control. In: Lee, YH., Kim, HN., Kim, J., Park, Y., Yang, L.T., Kim, S.W. (eds) Embedded Software and Systems. ICESS 2007. Lecture Notes in Computer Science, vol 4523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72685-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-72685-2_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72684-5
Online ISBN: 978-3-540-72685-2
eBook Packages: Computer ScienceComputer Science (R0)