Abstract
In this work the first step of an integration process between audio and video information for the localization of speakers in closed environments is presented. The proposed metod is based on binaural source localization followed by face recognition and tracking and was realized and implemented in a real environment. Some preliminary results demonstrated the effectiveness of this approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rayleigh, L.: On our perception of sound direction. Phil. Mag. 13, 214–232 (1907)
Blauert, J.: Spatial Hearing - The Psychophysics of Human Sound Localization. MIT Press (1996)
Raspaud, M., Viste, H., Evangelista, G.: Binaural source localization by joint estimation of ILD and ITD. IEEE Trans. on Audio, Speech and Language Processing 18(1), 68–77 (2010)
Monaci, G., Jost, P., Vandergheynst, P., Mailé, B., Lesage, S., Gribonval, R.: Learning multimodal dictionaries. IEEE Trans. on Image Processing 16(9), 2272–2283 (2007)
Zhang, C., Yin, P., Rui, Y., Cutler, R., Viola, P., Sun, X., Pinto, N., Zhang, Z.: Boosting-based multimodal speaker detection for distributed meeting videos. IEEE Trans. on Multimedia 10(8), 1541–1552 (2008)
Schmalenstroeer, J., Haeb-Umbach, R.: Online diarization of streaming audio-visual data for smart envirnments. IEEE Journ. of Selected Topics in Signal Processing 4(5), 845–856 (2010)
Naqvi, S.M., Wang, W., Khan, M.S., Barnard, M., Chambers, J.A.: Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking. IET Signal Processing 6(5), 466–477 (2012)
Minotto, V.P., Jung, C.R., Lee, B.: Simultaneous-speaker voice activity detection and localization using mid-fusion of svm and hmms. IEEE Trans. on Multimedia 16(4), 1032–1044 (2014)
Wang, D., Brown, G.J.: Computational Auditory Scene Analysis - Principles, Algorithms, and Applications. IEEE Press, Wiley Interscience (2006)
Algazi, V.R., Duda, R.O., Thompson, D.M., Avendano, C.: The CIPIC HRTF database. In: 2001 IEEE Workshop on Applications of Digital Signal Processing to Audio and Acoustics (2001)
Kuttruff, H.: Room Acoustics, 4th edn. Taylor & Francis (2000)
Stéphenne, A., Champagne, B.: A new cepstral prefiltering technique for estimating time delay under reverberant conditions. Signal Processing 59(3), 253–266 (1997)
Parisi, R., Gazzetta, R., Di Claudio, E.: Prefiltering approaches for time delay estimation in reverberant environments. In: Proceedings of ICASSP, vol. 3, pp. III-2997–III-3000 (2002)
Zannini, C.M., Parisi, R., Uncini, A.: Binaural sound source localization in the presence of reverberation. In: Proc. of the 17th International Conference on Digital Signal Processing (July 2011)
Parisi, R., Camoes, F., Scarpiniti, M., Uncini, A.: Cepstrum prefiltering for binaural source localization in reverberant environments. IEEE Signal Processing Letters 19(2), 99–102 (2012)
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. of Computer Vision 57(2), 137–154 (2004)
Freund, Y.Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Parisi, R., Comminiello, D., Scarpiniti, M., Uncini, A. (2015). Integration of Audio and Video Clues for Source Localization by a Robotic Head. In: Bassis, S., Esposito, A., Morabito, F. (eds) Advances in Neural Networks: Computational and Theoretical Issues. Smart Innovation, Systems and Technologies, vol 37. Springer, Cham. https://doi.org/10.1007/978-3-319-18164-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-18164-6_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18163-9
Online ISBN: 978-3-319-18164-6
eBook Packages: EngineeringEngineering (R0)