Skip to main content
Log in

Situiertes Sehen für bessere Erkennung von Objekten und Objektklassen

Situated vision for improved detection of objects and object classes

  • Originalarbeiten
  • Published:
e & i Elektrotechnik und Informationstechnik Aims and scope Submit manuscript

Summary

A main task for domestic robots is to recognize object categories. Image-based approaches learn from large data bases but have no access to contextual knowledge such as available to a robot navigating in the rooms at home. Consequently, we set out to exploit the knowledge available to the robot to constrain the task of object classification. Based on the estimation of free ground space in front of the robot, which is essential for the save navigation in a home setting, we show that we can greatly simplify self-localization, the detection of support surfaces, and the classification of objects. We further show that object classes can be efficiently acquired from 3D models of the Web if learned from automatically generated view data. We modelled 200 object classes (available from www.3d-net.org) and provide sample data of scenes for testing. Using the situated approach we can detect, e.g., chairs with 93 per cent detection rate.

Zusammenfassung

Eine Hauptaufgabe für Roboter ist es, Objekte und Objektklassen zu erkennen, um diese zu finden und handzuhaben. Bild-basierte Ansätze lernen aus großen Datenbanken, haben aber keinen Zugriff auf Kontextwissen zur Verfügung, zum Beispiel, wie Roboter in Zimmern navigieren. Wir schlagen daher den Ansatz des situierten Sehens vor, um kontextuelles Wissen über die Aufgabe und die Anwendung zur Verbesserung der Objekterkennung zu verwenden. Basierend auf der Bestimmung des freien Bodens vor dem Roboter, der für die sichere Navigation notwendig ist, zeigen wir, dass dadurch die Lokalisierung, das Erkennung von Flächen und die Kategorisierung von Objekten vereinfacht werden. Wir zeigen ferner, dass Objektklassen effizient aus 3D-Web-Daten gelernt werden können, wenn das Lernen virtuelle 2,5D-Ansichten verwendet, um die Sicht der Sensoren des Roboters auf die reale Welt anzunehmen. Mit diesem Ansatz wurden 200 Objektklassen (zu finden unter www.3d-net.org) modelliert und in Szenen erkannt, z. B. Stühle mit einer Erkennungsrate von 93 Prozent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literatur

  • Aldoma, A., Blodow, N., Gossow, D., Gedikli, S., Rusu, R. B.,Vincze, M., Bradski, G. (2011): Cad-model recognition and 6dof pose estimation using 3d cues. In: 3rd IEEE Workshop on 3D Representation and Recognition at IROS, 2011

  • Aldoma, A., Vincze, M. (2011): Towards 0-th order affordances through cad-model recognition and 6dof pose estimation. In: IROS Workshop Active Semantic Perception and Object Search in the Real World

  • Arras, K. O., Castellanos, J. A., Schilt, M., Siegwart, R. (2003): Feature-based multi-hypothesis localization and tracking using geometric constraints. Robot Auton Syst, 1 (44): 41–53

    Article  Google Scholar 

  • Buchaca, A. A., Vincze, M. (2011): Pose alignment for 3d models and single view stereo point clouds based on stable planes. In: Proceedings of the Joint 3DIM/3DPVT Conference 3D Imaging, Modeling, Processing, Visualization, Transmission (3DIMPVT 2011)

  • Einramhof, P., Vincze, M. (2010): Stereo-based real-time scene segmentation for a home robot. In: International Symposium ELMAR

  • Elinas, P., Little, J. J. (2005): omcl: Monte-carlo localization for mobile robots with stereo vision. In: Proceedings of Robotics: Science and Systems, pp 373–380, Cambridge, MA, USA

  • Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D. (2010): Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. (PAMI), 32 (9): 1627–1645

    Article  Google Scholar 

  • Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A., Dobkin, D., Jacobs. D. (2003): A search engine for 3d models. ACM Trans. Graphic., 22: 83–105

    Article  Google Scholar 

  • Golovinskiy, A., Kim, V. G., Funkhouser, T. (2009): Shape-based recognition of 3d point clouds in urban environments. ICCV

  • Helmer, S., Lowe, D. (2010): Using stereo for object recognition. ICRA

  • Helmer, S., Meger, D., Muja, M., Little, J. J., Lowe, D. G. (2010): Multiple viewpoint recognition and localization. In: Asian Computer Vision Conference ACCV

  • Humenberger, C., Zinner, C., Weber, M., Kubinger, W., Vincze, M. (2010): A fast stereo matching algorithm suitable for embedded real-time systems. Comput. Vis. Image Und., 114: 1180–1202

    Article  Google Scholar 

  • Itti, L., Koch, C., Niebur, E. (1998): A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal., 20 (11): 1254–1259

    Article  Google Scholar 

  • Johnson, A. E., Hebert, M. (1999): Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. 21 (5): 433–449

    Article  Google Scholar 

  • Kazhdan, M., Funkhouser, T., Rusinkiewicz, S. (2003): Rotation invariant spherical harmonic representation of 3d shape descriptors. SGP, 156–164

  • Lai, K., Bo, L., Ren, X., Fox, D. (2011): Sparse distance learning for object recognition combining rgb and depth information. In: ICRA

  • Land, M. F., Hayhoe, M. M. (2001): In what ways do eye movements contribute to everyday activities? Vision Res., 41: 3559–3565

    Article  Google Scholar 

  • Mataric, M. J. (2002): Situated robotics. In: Encyclopaedia of Cognitive Science. Nature Publishing Group, Macmillan Reference Ltd

  • Meger, D., Muja, M., Helmer, S., Gupta, A., Gamroth, C., Hoffman, T., Baumann, M., Southey, T., Fazli, P., Wohlkinger, W. Viswanathan, P., Little, J. J., Lowe, D. G., Orwell, J. (2010): Curious george: an integrated visual search platform. CRV

  • Nüchter, A., Surmann, H., Hertzberg, J. (2004): Automatic classification of objects in 3d laser range scans. IAS, 963–970

  • Olufs, S., Vincze, M. (2009): An efficient area-based observation model for monte-carlo robot localization. In: International Conference on Intelligent Robots and Systems IROS 2009, St. Louis, USA

  • Olufs, S., Vincze, M. (2011): Robust single view room structure segmentation in manhattan-like environments from stereo vision. In: Proceedings of 2011 IEEE International Conference on Robotics and Automation

  • Pfeifer, R., Lungarella, M., Iida, F. (2007): Self-organization, embodiment, and biologically inspired robotics. Science, 318: 1088–1093

    Article  Google Scholar 

  • Potapova, E., Zillich, M., Vincze, M. (2011): Learning what matters: combining probabilistic models of 2d and 3d saliency cues. In: International Conference on Computer Vision Systems ICVS, pp. 132–142

  • Pylyshyn, Z. (2001): Visual indexes, preconceptual objects, and situated vision. Cognition, 80: 127–158

    Article  Google Scholar 

  • Rusu, R. B., Blodow, N., Marton, Z.-C., Beetz, M. (2009): Close-range scene segmentation and reconstruction of 3d point cloud maps for mobile manipulation in domestic environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems

  • Rusu, R. B., Bradski, G., Thibaux, R., Hsu, J. (2010): Fast 3d recognition and pose using the viewpoint feature histogram. In: Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, 2155–2162, October 2010

  • Schlemmer, M. J. (2009): Getting Past Passive Vision – On the Use of an Ontology for Situated Perception in Robots. PhD thesis, Vienna University of Technology

  • Sudowe, P., Leibe, B. (2011): Efficient use of geometric constraints for sliding-window detection in video. In: ICVS International Conference on Vision Systems, LNCS, Springer Verlag

  • Sutton, M., Stark, L., Bowyer, K. (1998): Function from visual analysis and physical interaction: a methodology for recognition of generic classes of objects. Image Vision Comput., 16: 746–763

    Article  Google Scholar 

  • Swadzba, A., Wachsmuth, S. (2010): Indoor scene classification using combined 3d and gist features. In: Asian Conference on Computer Vision, vol. 2, 725–739, Queenstown, New Zealand

  • Thrun, S., Fox, D., Burgard, W., Dellaert, F. (2000): Robust monte carlo localization for mobile robots. Artif. Intell., 128 (1–2): 99–141

    Google Scholar 

  • Tombari, F., Salti, S., Stefano, L. D. (2010): Unique signatures of histograms for local surface description. In: 11th European Conference on Computer Vision

  • Viswanathan, P., Meger, D., Southey, T., Little, J. J., Mackworth, A. (2009): Automated spatial-semantic modeling with applications to place labeling and informed search. CRV

  • Wohlkinger, W., Vincze, M. (2010): 3d object classification for mobile robots in home-environments using web-data. In: IEEE International Workshop on Robotics in Alpe-Adria-Danube Region RAAD

  • Wohlkinger, W., Vincze, M. (2011): Shape-based depth image to 3d model matching and classification with inter-view similarity. In: submitted to IEEE IROS

  • Zhang, J., Huang, K., Yu, Y., Tan, T. (2011): Boosted local structured hog-lbp for object localization. In: Computer Vision and Pattern Recognition (CVPR)

  • Zhou, K., Richtsfeld, A., Zillich, M., Vincze, M. (2011): Coherent spatial abstraction and stereo line detection for robotic visual attention. In: Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011)

Download references

Author information

Authors and Affiliations

Authors

Additional information

M. Zillich: The research leading to these results has received funding from the European Community for projects robots@home (IST-6-043450), CogX (IST-7-215181) and HOBBIT (IST-7-288246).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vincze, M., Wohlkinger, W., Aldoma, A. et al. Situiertes Sehen für bessere Erkennung von Objekten und Objektklassen. Elektrotech. Inftech. 129, 42–52 (2012). https://doi.org/10.1007/s00502-012-0072-6

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00502-012-0072-6

Keywords

Schlüsselwörter

Navigation