Skip to main content

The Video Conference Tool Robot ViCToR

  • Conference paper
  • First Online:
  • 4115 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9245))

Abstract

We present a robotic tool that autonomously follows a conversation to enable remote presence in video conferencing. When humans participate in a meeting with the help of video conferencing tools, it is crucial that they are able to follow the conversation both with acoustic and visual input. To this end, we design and implement a video conferencing tool robot that uses binaural sound source localization as its main source to autonomously orient towards the currently talking speaker. To increase robustness of the acoustic cue against noise we supplement the sound localization with a source detection stage. Also, we include a simple onset detector to retain fast response times. Since we only use two microphones, we are confronted with ambiguities on whether a source is in front or behind the device. We resolve these ambiguities with the help of face detection and additional moves. We tailor the system to our target scenarios in experiments with a four minute scripted conversation. In these experiments we evaluate the influence of different system settings on the responsiveness and accuracy of the device.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adalgeirsson, S.O., Breazeal, C.: MeBot: a robotic platform for socially embodied presence. In: Proc. 5th Int’l Conf. on HRI (HRI 2010), pp. 15–22. IEEE Press (2010)

    Google Scholar 

  2. Belle, V., Deselaers, T., Schiffer, S.: Randomized trees for real-time one-step face detection and recognition. In: Proc. Int’l Conf. on Pattern Recognition (ICPR 2008), pp. 1–4. IEEE Computer Society, December 8–11, 2008

    Google Scholar 

  3. Blauert, J.: Spatial Hearing: The Psychophysics of Human Sound Localization. MIT Press (1997)

    Google Scholar 

  4. Dietz, M., Klein-Henning, M., Hohmann, V.: The influence of pause, attack, and decay duration of the ongoing envelope on sound lateralization. Journal of the Acoustical Society of America 137(2), EL137–EL143 (2015)

    Article  Google Scholar 

  5. Dietz, M., Marquardt, T., Stange, A., Pecka, M., Grothe, B., McAlpine, D.: Emphasis of spatial cues in the temporal fine structure during the rising segments of amplitude-modulated sounds ii: Single neuron recordings. Journal of Neurophysiology 111(10), 1973–1985 (2014)

    Article  Google Scholar 

  6. Elhilali, M., Xiang, J., Shamma, S.A., Simon, J.Z.: Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scene. PLoS Biology 7(6), e1000129 (2009)

    Article  Google Scholar 

  7. Faller, C., Merimaa, J.: Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. J. Acoust. Soc. Am. 116(5), 3075–3089 (2004)

    Article  Google Scholar 

  8. Goeckel, T., Lakemeyer, G., Wagner, H.: Echo suppression for sound localization with a model of the precendence effect. Tech. rep., Biology II, RWTH Aachen University (2014)

    Google Scholar 

  9. Jones, M., Viola, P.: Face recognition using boosted local features. In: Proc. ICCV (2003)

    Google Scholar 

  10. Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15, 1943–1947 (2005)

    Article  Google Scholar 

  11. Kristoffersson, A., Coradeschi, S., Loutfi, A.: A review of mobile robotic telepresence. Adv. in Hum.-Comp. Int. 2013, 3:3–3:3 (2013)

    Google Scholar 

  12. Litovsky, R.Y., Colburn, H.S., Yost, W.A., Guzman, S.J.: The precedence effect. J. Acoust. Soc. Am. 106(4), 1633–1654 (1999)

    Article  Google Scholar 

  13. May, T., van de Par, S., Kohlrausch, A.: A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans. Audio, Speech, Language Process (1) (2011)

    Google Scholar 

  14. Sanchez-Riera, J., Alameda-Pineda, X., Wienke, J., Deleforge, A., Arias, S., Cech, J., Wrede, S., Horaud, Radu, P.: Online multimodal speaker detection for humanoid robots. In: Proc. Int’l Conf. on Humanoid Robotics (Humanoids 2012), pp. 126–133. IEEE, December 2012

    Google Scholar 

  15. Sangwan, A., Chiranth, M., Jamadagni, H., Sah, R., Prasad, R., Gaurav, V.: Vad techniques for real-time speech transmission on the internet, pp. 46–50 (2002)

    Google Scholar 

  16. Schiffer, S.: cAPTUre: a configurable audio pan-tilt unit for repeatable experimentation. Tech. rep., Knowledge-based Systems Group, RWTH Aachen University (2012)

    Google Scholar 

  17. Supper, B., Brookes, T., Rumsey, F.: An auditory onset detection algorithm for improved automatic source localization. IEEE Trans. Audio, Speech Language Process 14(3), 1008–1016 (2006)

    Article  Google Scholar 

  18. Zhang, C., Yin, P., Rui, Y., Cutler, R., Viola, P., Sun, X., Pinto, N., Zhang, Z.: Boosting-based multimodal speaker detection for distributed meeting videos. IEEE Transactions on Multimedia 10(8), 1541–1552 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Schiffer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Goeckel, T., Schiffer, S., Wagner, H., Lakemeyer, G. (2015). The Video Conference Tool Robot ViCToR. In: Liu, H., Kubota, N., Zhu, X., Dillmann, R., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2015. Lecture Notes in Computer Science(), vol 9245. Springer, Cham. https://doi.org/10.1007/978-3-319-22876-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22876-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22875-4

  • Online ISBN: 978-3-319-22876-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics