Skip to main content
Log in

Target speech feature extraction using non-parametric correlation coefficient

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Speech recognition systems for the automobile have a few weaknesses, including failure to recognize speech due to the mixing of environment noise from inside and outside the car and from other voices. Therefore, this paper features a technique for extracting only the selected target voice from input sound that is a mixture of voices and noises. The feature for selective speech extraction composes a correlation map of auditory elements by using similarity between channels and continuity of time, and utilizes a method of extracting speech features by using a non-parametric correlation coefficient. This proposed method was validated by showing that the average distortion of separation of the technique decreased by 0.8630 dB. It was shown that the performance of the selective feature extraction utilizing a cross correlation is good, but overall, the selective feature extraction utilizing a non-parametric correlation is better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000)

    Article  Google Scholar 

  2. Gowdy, J.N., Subramanya, A., Bartels, C., Bilmes, J.: DBN-based muti-stream models for audio-visual speech recognition. In: Proc. IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 993–996 (2004)

    Google Scholar 

  3. Bilmes, J.A., Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 22, 89–100 (2005)

    Article  Google Scholar 

  4. Schwartz, J.-L., Berthommier, F., Savariaux, C.: Seeing to hear better: evidence for early audio-visual interactions in speech identification. ERIC J. Rep.-Res. Cogn. 93(2), 69–78 (2004)

    Google Scholar 

  5. Chibelushi, C.C., Deravi, F., Moson, J.S.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002)

    Article  Google Scholar 

  6. Pham, T.T., Kim, J.Y., Na, S.Y., Hwang, S.T.: Robust eye localization for lip reading in mobile environment. In: Proc. of SCIS&ISIS, Japan, pp. 385–388 (2008)

    Google Scholar 

  7. Pham, T.T., Song, M.G., Kim, J.Y., Na, S.Y., Hwang, S.T.: A robust lip center detection in cell phone environment. In: Proc. of IEEE Symposium on Signal Processing and Information Technology, Sarajevo, pp. 390–395 (2008)

    Google Scholar 

  8. Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15, 1135–1150 (2004)

    Article  Google Scholar 

  9. Wu, X.H.: Auditory perception mechanism and computational auditory scene analysis. Post doctor research report (1997)

  10. Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Commun. 34, 267–285 (2001)

    Article  MATH  Google Scholar 

  11. Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Commun. 43(4), 275–296 (2004)

    Article  Google Scholar 

  12. Shao, Y., Wang, D.L.: Model-based sequential organization in cochannel speech. IEEE Trans. Audio Speech Lang. Process. 14, 289–298 (2006)

    Article  Google Scholar 

  13. Cooke, M.: A glimpsing model of speech perception in noise. J. Acoust. Soc. Am. 119(3), 1562–1573 (2006)

    Article  MathSciNet  Google Scholar 

  14. Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006)

    Article  Google Scholar 

  15. Moharil, S., Lee, S.Y.: Load balancing on temporally heterogeneous cluster of workstations for parallel simulated annealing. Clust. Comput. 14(4), 295–310 (2011)

    Article  Google Scholar 

  16. Hasswa, A., Hassanein, H.: A smart spaces architecture based on heterogeneous contexts, particularly social contexts. Clust. Comput. 15(4), 373–390 (2012)

    Article  Google Scholar 

  17. Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: Monotonicity and performance evaluation: applications to high speed and mobile networks. Clust. Comput. 15(4), 401–414 (2012)

    Article  Google Scholar 

  18. Kim, J.H., Chung, K.Y.: Ontology-based healthcare context information model to implement ubiquitous environment. Multimed. Tools Appl. (2013). doi:10.1007/s11042-011-0919-6

    Google Scholar 

  19. Kim, J.H., Lee, D., Chung, K.Y.: Item recommendation based on context-aware model for personalized u-healthcare service. Multimed. Tools Appl. (2013). doi:10.1007/s11042-011-0920-0

    Google Scholar 

  20. Chung, K.Y., Yoo, J., Kim, K.J.: Recent trends on mobile computing and future networks. Pers. Ubiquitous Comput. (2013). doi:10.1007/s00779-013-0682-y

    Google Scholar 

  21. Kang, S.K., Chung, K.Y., Lee, J.H.: Development of head detection and tracking systems for visual surveillance. Pers. Ubiquitous Comput. (2013). doi:10.1007/s00779-013-0668-9

    Google Scholar 

  22. Lee, K.D., Nam, M.Y., Chung, K.Y., Lee, Y.H., Kang, U.G.: Context and profile based cascade classifier for efficient people detection and safety care system. Multimed. Tools Appl. 63(1), 27–44 (2013)

    Article  Google Scholar 

  23. Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: A study of a valid frequency range using correlation analysis of throat signal. Inf. Int. Interdiscip. J. 14(11), 3791–3799 (2011)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Gachon University research fund of 2013 (GCU-2013-R107).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sang Yeob Oh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oh, S.Y., Chung, KY. Target speech feature extraction using non-parametric correlation coefficient. Cluster Comput 17, 893–899 (2014). https://doi.org/10.1007/s10586-013-0284-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0284-5

Keywords

Navigation