The Video Conference Tool Robot ViCToR

Goeckel, Tom; Schiffer, Stefan; Wagner, Hermann; Lakemeyer, Gerhard

doi:10.1007/978-3-319-22876-1_6

Tom Goeckel⁹,
Stefan Schiffer¹⁰,
Hermann Wagner⁹ &
…
Gerhard Lakemeyer¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9245))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

4261 Accesses

Abstract

We present a robotic tool that autonomously follows a conversation to enable remote presence in video conferencing. When humans participate in a meeting with the help of video conferencing tools, it is crucial that they are able to follow the conversation both with acoustic and visual input. To this end, we design and implement a video conferencing tool robot that uses binaural sound source localization as its main source to autonomously orient towards the currently talking speaker. To increase robustness of the acoustic cue against noise we supplement the sound localization with a source detection stage. Also, we include a simple onset detector to retain fast response times. Since we only use two microphones, we are confronted with ambiguities on whether a source is in front or behind the device. We resolve these ambiguities with the help of face detection and additional moves. We tailor the system to our target scenarios in experiments with a four minute scripted conversation. In these experiments we evaluate the influence of different system settings on the responsiveness and accuracy of the device.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

iSocioBot: A Multimodal Interactive Social Robot

Article 14 September 2017

Facial Recognition and Pathfinding on the Humanoid Robot Pepper as a Starting Point for Social Interaction

Collaborative analysis of audio-visual speech synthesis with sensor measurements for regulating human–robot interaction

Article 09 August 2022

References

Adalgeirsson, S.O., Breazeal, C.: MeBot: a robotic platform for socially embodied presence. In: Proc. 5th Int’l Conf. on HRI (HRI 2010), pp. 15–22. IEEE Press (2010)
Google Scholar
Belle, V., Deselaers, T., Schiffer, S.: Randomized trees for real-time one-step face detection and recognition. In: Proc. Int’l Conf. on Pattern Recognition (ICPR 2008), pp. 1–4. IEEE Computer Society, December 8–11, 2008
Google Scholar
Blauert, J.: Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press (1997)
Google Scholar
Dietz, M., Klein-Henning, M., Hohmann, V.: The influence of pause, attack, and decay duration of the ongoing envelope on sound lateralization. Journal of the Acoustical Society of America 137(2), EL137–EL143 (2015)
Article Google Scholar
Dietz, M., Marquardt, T., Stange, A., Pecka, M., Grothe, B., McAlpine, D.: Emphasis of spatial cues in the temporal fine structure during the rising segments of amplitude-modulated sounds ii: Single neuron recordings. Journal of Neurophysiology 111(10), 1973–1985 (2014)
Article Google Scholar
Elhilali, M., Xiang, J., Shamma, S.A., Simon, J.Z.: Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biology 7(6), e1000129 (2009)
Article Google Scholar
Faller, C., Merimaa, J.: Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am. 116(5), 3075–3089 (2004)
Article Google Scholar
Goeckel, T., Lakemeyer, G., Wagner, H.: Echo suppression for sound localization with a model of the precendence effect. Tech. rep., Biology II, RWTH Aachen University (2014)
Google Scholar
Jones, M., Viola, P.: Face recognition using boosted local features. In: Proc. ICCV (2003)
Google Scholar
Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15, 1943–1947 (2005)
Article Google Scholar
Kristoffersson, A., Coradeschi, S., Loutfi, A.: A review of mobile robotic telepresence. Adv. in Hum.-Comp. Int. 2013, 3:3–3:3 (2013)
Google Scholar
Litovsky, R.Y., Colburn, H.S., Yost, W.A., Guzman, S.J.: The precedence effect. J. Acoust. Soc. Am. 106(4), 1633–1654 (1999)
Article Google Scholar
May, T., van de Par, S., Kohlrausch, A.: A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Language Process (1) (2011)
Google Scholar
Sanchez-Riera, J., Alameda-Pineda, X., Wienke, J., Deleforge, A., Arias, S., Cech, J., Wrede, S., Horaud, Radu, P.: Online multimodal speaker detection for humanoid robots. In: Proc. Int’l Conf. on Humanoid Robotics (Humanoids 2012), pp. 126–133. IEEE, December 2012
Google Scholar
Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Prasad, R., Gaurav, V.: Vad techniques for real-time speech transmission on the internet, pp. 46–50 (2002)
Google Scholar
Schiffer, S.: cAPTUre: a configurable audio pan-tilt unit for repeatable experimentation. Tech. rep., Knowledge-based Systems Group, RWTH Aachen University (2012)
Google Scholar
Supper, B., Brookes, T., Rumsey, F.: An auditory onset detection algorithm for improved automatic source localization. IEEE Trans. Audio, Speech Language Process 14(3), 1008–1016 (2006)
Article Google Scholar
Zhang, C., Yin, P., Rui, Y., Cutler, R., Viola, P., Sun, X., Pinto, N., Zhang, Z.: Boosting-based multimodal speaker detection for distributed meeting videos. IEEE Transactions on Multimedia 10(8), 1541–1552 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Biology II, RWTH Aachen University, Aachen, Germany
Tom Goeckel & Hermann Wagner
Knowledge Based Systems Group (KBSG), RWTH Aachen University, Aachen, Germany
Stefan Schiffer & Gerhard Lakemeyer

Authors

Tom Goeckel
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schiffer
View author publications
You can also search for this author in PubMed Google Scholar
Hermann Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Lakemeyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Schiffer .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, United Kingdom
Honghai Liu
Tokyo Metropolitan University, Tokyo, Japan
Naoyuki Kubota
Shanghai Jiao Tong University, Shanghai, China
Xiangyang Zhu
Karlsruhe Institute of Technology, Karlsruhe, Germany
Rüdiger Dillmann
University of Portsmouth, Portsmouth, United Kingdom
Dalin Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goeckel, T., Schiffer, S., Wagner, H., Lakemeyer, G. (2015). The Video Conference Tool Robot ViCToR. In: Liu, H., Kubota, N., Zhu, X., Dillmann, R., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2015. Lecture Notes in Computer Science(), vol 9245. Springer, Cham. https://doi.org/10.1007/978-3-319-22876-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-22876-1_6
Published: 20 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22875-4
Online ISBN: 978-3-319-22876-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics