Abstract
In this paper we investigate methods to adapt a system for filled pause (FP) disfluency removal to different data properties. A gradient descent algorithm for parameter optimization is presented which achieves 80.6% recall and 87.7% precision on the FP dataset and 46.5% recall and 79.6% precision on the FPElo dataset. This compares to the results produced with hand-optimization on the test set. Furthermore we investigated the impact of cross-validation and training set selection on recognizer output in order to improve the speech retrieval system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Goto, M., Itou, K., Hayamizu, S.: A Real-Time Filled Pause Detection System. In: ESCApp., pp. 227–230 (1999)
Audhkhasi, K.: Formant-Based Technique for Automatic Filled-Pause Detection in Spontaneous Spoken English. In: IEEE Trans, Acoustics, Speech and Signal Processing Proc. (2009)
Kaushik, M.: Automatic Detection and Removal of Disfluencies from Spontaneous Speech. In: 13th Australasian Int. Conf. on Speech Science and Technology Melbourne, pp. 98–101 (2010)
Veiga, A., Candeias, S.: Carla, L., Fernando, P.: Characterization of Hesitations Using Acoustic Models. In: ICPhS XVII (2011)
Stouten, F., Martens, J.P.: A Feature-Based Filled Pause Detection System for Dutch. In: IEEE, pp. 2–7. ASRU (2003)
Ogata, J., Goto, M., Itou, K.: The Use of Acoustically Detected Filled and Silent Pauses in Spontaneous Speech Recognition National Institute of Advanced Industrial Science and Technology (AIST) (2), 4305–4308 (2009)
Garg, G., Ward, N.: Detecting Filled Pauses in Tutorial Dialogs 0415150, 1–9 (2006)
Xiong, L.Y.: A Novel Detection Method of Filled Pause in Mandarin Spontaneous Speech. In: IEEE Trans, Computer and Information Science Proc. (2008)
Majeed, S.A., Husain, H., Samad, S.A., Hussain, A.: Hierarchical K-Means Algorithm Applied on Isolated Malay Digit Speech Recognition. In: ICSEM 2012, vol. 34, pp. 33–37 (2012)
Zhang, G.P.: Neural networks for classification: a survey. IEEE 30(4), 451–462 (2000)
Kitayama, K., Goto, M., Itou, K., Kobayashi, T.: Speech Starter: Noise-Robust Endpoint Detection by Using Filled Pauses, pp. 1237–1240 (2003)
Lee, L.W., Low, H.M., Mohamed, A.R.: A Comparative Analysis of Word Structures in Malay and English Children’s Stories. Social Sciences & Humanities 21(1), 67–84 (2013)
Wang, Y., Waibel, A.: Decoding Algorithm in Statistical Machine Translation. In: Proceedings of the 35th Annual Meeting of the ACL (1997)
Honal, M., Schultz, T.: Automatic Disfluency Removal on Recognized Spontaneous Speech - Rapid Adaptation to Speaker-Dependent Disfluencies. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 969–972 (2005)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hamzah, R., Jamil, N., Seman, N. (2014). Nurturing Filled Pause Detection for Spontaneous Speech Retrieval. In: Jaafar, A., et al. Information Retrieval Technology. AIRS 2014. Lecture Notes in Computer Science, vol 8870. Springer, Cham. https://doi.org/10.1007/978-3-319-12844-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-12844-3_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12843-6
Online ISBN: 978-3-319-12844-3
eBook Packages: Computer ScienceComputer Science (R0)