Abstract
Speech recognition systems for the automobile have a few weaknesses, including failure to recognize speech due to the mixing of environment noise from inside and outside the car and from other voices. Therefore, this paper features a technique for extracting only the selected target voice from input sound that is a mixture of voices and noises. The feature for selective speech extraction composes a correlation map of auditory elements by using similarity between channels and continuity of time, and utilizes a method of extracting speech features by using a non-parametric correlation coefficient. This proposed method was validated by showing that the average distortion of separation of the technique decreased by 0.8630 dB. It was shown that the performance of the selective feature extraction utilizing a cross correlation is good, but overall, the selective feature extraction utilizing a non-parametric correlation is better.
Similar content being viewed by others
References
Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000)
Gowdy, J.N., Subramanya, A., Bartels, C., Bilmes, J.: DBN-based muti-stream models for audio-visual speech recognition. In: Proc. IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 993–996 (2004)
Bilmes, J.A., Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 22, 89–100 (2005)
Schwartz, J.-L., Berthommier, F., Savariaux, C.: Seeing to hear better: evidence for early audio-visual interactions in speech identification. ERIC J. Rep.-Res. Cogn. 93(2), 69–78 (2004)
Chibelushi, C.C., Deravi, F., Moson, J.S.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002)
Pham, T.T., Kim, J.Y., Na, S.Y., Hwang, S.T.: Robust eye localization for lip reading in mobile environment. In: Proc. of SCIS&ISIS, Japan, pp. 385–388 (2008)
Pham, T.T., Song, M.G., Kim, J.Y., Na, S.Y., Hwang, S.T.: A robust lip center detection in cell phone environment. In: Proc. of IEEE Symposium on Signal Processing and Information Technology, Sarajevo, pp. 390–395 (2008)
Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15, 1135–1150 (2004)
Wu, X.H.: Auditory perception mechanism and computational auditory scene analysis. Post doctor research report (1997)
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Commun. 34, 267–285 (2001)
Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Commun. 43(4), 275–296 (2004)
Shao, Y., Wang, D.L.: Model-based sequential organization in cochannel speech. IEEE Trans. Audio Speech Lang. Process. 14, 289–298 (2006)
Cooke, M.: A glimpsing model of speech perception in noise. J. Acoust. Soc. Am. 119(3), 1562–1573 (2006)
Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006)
Moharil, S., Lee, S.Y.: Load balancing on temporally heterogeneous cluster of workstations for parallel simulated annealing. Clust. Comput. 14(4), 295–310 (2011)
Hasswa, A., Hassanein, H.: A smart spaces architecture based on heterogeneous contexts, particularly social contexts. Clust. Comput. 15(4), 373–390 (2012)
Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: Monotonicity and performance evaluation: applications to high speed and mobile networks. Clust. Comput. 15(4), 401–414 (2012)
Kim, J.H., Chung, K.Y.: Ontology-based healthcare context information model to implement ubiquitous environment. Multimed. Tools Appl. (2013). doi:10.1007/s11042-011-0919-6
Kim, J.H., Lee, D., Chung, K.Y.: Item recommendation based on context-aware model for personalized u-healthcare service. Multimed. Tools Appl. (2013). doi:10.1007/s11042-011-0920-0
Chung, K.Y., Yoo, J., Kim, K.J.: Recent trends on mobile computing and future networks. Pers. Ubiquitous Comput. (2013). doi:10.1007/s00779-013-0682-y
Kang, S.K., Chung, K.Y., Lee, J.H.: Development of head detection and tracking systems for visual surveillance. Pers. Ubiquitous Comput. (2013). doi:10.1007/s00779-013-0668-9
Lee, K.D., Nam, M.Y., Chung, K.Y., Lee, Y.H., Kang, U.G.: Context and profile based cascade classifier for efficient people detection and safety care system. Multimed. Tools Appl. 63(1), 27–44 (2013)
Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: A study of a valid frequency range using correlation analysis of throat signal. Inf. Int. Interdiscip. J. 14(11), 3791–3799 (2011)
Acknowledgements
This work was supported by the Gachon University research fund of 2013 (GCU-2013-R107).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Oh, S.Y., Chung, KY. Target speech feature extraction using non-parametric correlation coefficient. Cluster Comput 17, 893–899 (2014). https://doi.org/10.1007/s10586-013-0284-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-013-0284-5