Integration of Audio and Video Clues for Source Localization by a Robotic Head

Parisi, Raffaele; Comminiello, Danilo; Scarpiniti, Michele; Uncini, Aurelio

doi:10.1007/978-3-319-18164-6_15

Raffaele Parisi⁶,
Danilo Comminiello⁶,
Michele Scarpiniti⁶ &
…
Aurelio Uncini⁶

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 37))

Abstract

In this work the first step of an integration process between audio and video information for the localization of speakers in closed environments is presented. The proposed metod is based on binaural source localization followed by face recognition and tracking and was realized and implemented in a real environment. Some preliminary results demonstrated the effectiveness of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Attentional Mechanism Based on a Microphone Array for Embedded Devices and a Single Camera

Multi-modal Data Fusion for People Perception in the Social Robot Haru

Collaborative analysis of audio-visual speech synthesis with sensor measurements for regulating human–robot interaction

Article 09 August 2022

References

Rayleigh, L.: On our perception of sound direction. Phil. Mag. 13, 214–232 (1907)
Article Google Scholar
Blauert, J.: Spatial Hearing - The Psychophysics of Human Sound Localization. MIT Press (1996)
Google Scholar
Raspaud, M., Viste, H., Evangelista, G.: Binaural source localization by joint estimation of ILD and ITD. IEEE Trans. on Audio, Speech and Language Processing 18(1), 68–77 (2010)
Article Google Scholar
Monaci, G., Jost, P., Vandergheynst, P., Mailé, B., Lesage, S., Gribonval, R.: Learning multimodal dictionaries. IEEE Trans. on Image Processing 16(9), 2272–2283 (2007)
Article Google Scholar
Zhang, C., Yin, P., Rui, Y., Cutler, R., Viola, P., Sun, X., Pinto, N., Zhang, Z.: Boosting-based multimodal speaker detection for distributed meeting videos. IEEE Trans. on Multimedia 10(8), 1541–1552 (2008)
Article Google Scholar
Schmalenstroeer, J., Haeb-Umbach, R.: Online diarization of streaming audio-visual data for smart envirnments. IEEE Journ. of Selected Topics in Signal Processing 4(5), 845–856 (2010)
Article Google Scholar
Naqvi, S.M., Wang, W., Khan, M.S., Barnard, M., Chambers, J.A.: Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking. IET Signal Processing 6(5), 466–477 (2012)
Article MathSciNet Google Scholar
Minotto, V.P., Jung, C.R., Lee, B.: Simultaneous-speaker voice activity detection and localization using mid-fusion of svm and hmms. IEEE Trans. on Multimedia 16(4), 1032–1044 (2014)
Article Google Scholar
Wang, D., Brown, G.J.: Computational Auditory Scene Analysis - Principles, Algorithms, and Applications. IEEE Press, Wiley Interscience (2006)
Google Scholar
Algazi, V.R., Duda, R.O., Thompson, D.M., Avendano, C.: The CIPIC HRTF database. In: 2001 IEEE Workshop on Applications of Digital Signal Processing to Audio and Acoustics (2001)
Google Scholar
Kuttruff, H.: Room Acoustics, 4th edn. Taylor & Francis (2000)
Google Scholar
Stéphenne, A., Champagne, B.: A new cepstral prefiltering technique for estimating time delay under reverberant conditions. Signal Processing 59(3), 253–266 (1997)
Article MATH Google Scholar
Parisi, R., Gazzetta, R., Di Claudio, E.: Prefiltering approaches for time delay estimation in reverberant environments. In: Proceedings of ICASSP, vol. 3, pp. III-2997–III-3000 (2002)
Google Scholar
Zannini, C.M., Parisi, R., Uncini, A.: Binaural sound source localization in the presence of reverberation. In: Proc. of the 17th International Conference on Digital Signal Processing (July 2011)
Google Scholar
Parisi, R., Camoes, F., Scarpiniti, M., Uncini, A.: Cepstrum prefiltering for binaural source localization in reverberant environments. IEEE Signal Processing Letters 19(2), 99–102 (2012)
Article Google Scholar
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. of Computer Vision 57(2), 137–154 (2004)
Article Google Scholar
Freund, Y.Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

DIET Dept., University of Rome “Sapienza”, Rome, Italy
Raffaele Parisi, Danilo Comminiello, Michele Scarpiniti & Aurelio Uncini

Authors

Raffaele Parisi
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Comminiello
View author publications
You can also search for this author in PubMed Google Scholar
Michele Scarpiniti
View author publications
You can also search for this author in PubMed Google Scholar
Aurelio Uncini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raffaele Parisi .

Editor information

Editors and Affiliations

Computer Science Department, University of Milano, Milano, Italy
Simone Bassis
Dipartimento di Psicologia & Vietri sul Mare (SA), Seconda Universitá di Napoli, International Institute for Advanced Scientiﬁc Studies (IIASS), Caserta, Italy
Anna Esposito
Department of Civil, Environmental, Energy, and Material Engineering, University Mediterranea of Reggio Calabria, Reggio Calabria, Italy
Francesco Carlo Morabito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Parisi, R., Comminiello, D., Scarpiniti, M., Uncini, A. (2015). Integration of Audio and Video Clues for Source Localization by a Robotic Head. In: Bassis, S., Esposito, A., Morabito, F. (eds) Advances in Neural Networks: Computational and Theoretical Issues. Smart Innovation, Systems and Technologies, vol 37. Springer, Cham. https://doi.org/10.1007/978-3-319-18164-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-18164-6_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18163-9
Online ISBN: 978-3-319-18164-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics