Abstract
We present a robotic tool that autonomously follows a conversation to enable remote presence in video conferencing. When humans participate in a meeting with the help of video conferencing tools, it is crucial that they are able to follow the conversation both with acoustic and visual input. To this end, we design and implement a video conferencing tool robot that uses binaural sound source localization as its main source to autonomously orient towards the currently talking speaker. To increase robustness of the acoustic cue against noise we supplement the sound localization with a source detection stage. Also, we include a simple onset detector to retain fast response times. Since we only use two microphones, we are confronted with ambiguities on whether a source is in front or behind the device. We resolve these ambiguities with the help of face detection and additional moves. We tailor the system to our target scenarios in experiments with a four minute scripted conversation. In these experiments we evaluate the influence of different system settings on the responsiveness and accuracy of the device.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adalgeirsson, S.O., Breazeal, C.: MeBot: a robotic platform for socially embodied presence. In: Proc. 5th Int’l Conf. on HRI (HRI 2010), pp. 15–22. IEEE Press (2010)
Belle, V., Deselaers, T., Schiffer, S.: Randomized trees for real-time one-step face detection and recognition. In: Proc. Int’l Conf. on Pattern Recognition (ICPR 2008), pp. 1–4. IEEE Computer Society, December 8–11, 2008
Blauert, J.: Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press (1997)
Dietz, M., Klein-Henning, M., Hohmann, V.: The influence of pause, attack, and decay duration of the ongoing envelope on sound lateralization. Journal of the Acoustical Society of America 137(2), EL137–EL143 (2015)
Dietz, M., Marquardt, T., Stange, A., Pecka, M., Grothe, B., McAlpine, D.: Emphasis of spatial cues in the temporal fine structure during the rising segments of amplitude-modulated sounds ii: Single neuron recordings. Journal of Neurophysiology 111(10), 1973–1985 (2014)
Elhilali, M., Xiang, J., Shamma, S.A., Simon, J.Z.: Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biology 7(6), e1000129 (2009)
Faller, C., Merimaa, J.: Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am. 116(5), 3075–3089 (2004)
Goeckel, T., Lakemeyer, G., Wagner, H.: Echo suppression for sound localization with a model of the precendence effect. Tech. rep., Biology II, RWTH Aachen University (2014)
Jones, M., Viola, P.: Face recognition using boosted local features. In: Proc. ICCV (2003)
Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15, 1943–1947 (2005)
Kristoffersson, A., Coradeschi, S., Loutfi, A.: A review of mobile robotic telepresence. Adv. in Hum.-Comp. Int. 2013, 3:3–3:3 (2013)
Litovsky, R.Y., Colburn, H.S., Yost, W.A., Guzman, S.J.: The precedence effect. J. Acoust. Soc. Am. 106(4), 1633–1654 (1999)
May, T., van de Par, S., Kohlrausch, A.: A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Language Process (1) (2011)
Sanchez-Riera, J., Alameda-Pineda, X., Wienke, J., Deleforge, A., Arias, S., Cech, J., Wrede, S., Horaud, Radu, P.: Online multimodal speaker detection for humanoid robots. In: Proc. Int’l Conf. on Humanoid Robotics (Humanoids 2012), pp. 126–133. IEEE, December 2012
Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Prasad, R., Gaurav, V.: Vad techniques for real-time speech transmission on the internet, pp. 46–50 (2002)
Schiffer, S.: cAPTUre: a configurable audio pan-tilt unit for repeatable experimentation. Tech. rep., Knowledge-based Systems Group, RWTH Aachen University (2012)
Supper, B., Brookes, T., Rumsey, F.: An auditory onset detection algorithm for improved automatic source localization. IEEE Trans. Audio, Speech Language Process 14(3), 1008–1016 (2006)
Zhang, C., Yin, P., Rui, Y., Cutler, R., Viola, P., Sun, X., Pinto, N., Zhang, Z.: Boosting-based multimodal speaker detection for distributed meeting videos. IEEE Transactions on Multimedia 10(8), 1541–1552 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Goeckel, T., Schiffer, S., Wagner, H., Lakemeyer, G. (2015). The Video Conference Tool Robot ViCToR. In: Liu, H., Kubota, N., Zhu, X., Dillmann, R., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2015. Lecture Notes in Computer Science(), vol 9245. Springer, Cham. https://doi.org/10.1007/978-3-319-22876-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-22876-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22875-4
Online ISBN: 978-3-319-22876-1
eBook Packages: Computer ScienceComputer Science (R0)