Summary
A main task for domestic robots is to recognize object categories. Image-based approaches learn from large data bases but have no access to contextual knowledge such as available to a robot navigating in the rooms at home. Consequently, we set out to exploit the knowledge available to the robot to constrain the task of object classification. Based on the estimation of free ground space in front of the robot, which is essential for the save navigation in a home setting, we show that we can greatly simplify self-localization, the detection of support surfaces, and the classification of objects. We further show that object classes can be efficiently acquired from 3D models of the Web if learned from automatically generated view data. We modelled 200 object classes (available from www.3d-net.org) and provide sample data of scenes for testing. Using the situated approach we can detect, e.g., chairs with 93 per cent detection rate.
Zusammenfassung
Eine Hauptaufgabe für Roboter ist es, Objekte und Objektklassen zu erkennen, um diese zu finden und handzuhaben. Bild-basierte Ansätze lernen aus großen Datenbanken, haben aber keinen Zugriff auf Kontextwissen zur Verfügung, zum Beispiel, wie Roboter in Zimmern navigieren. Wir schlagen daher den Ansatz des situierten Sehens vor, um kontextuelles Wissen über die Aufgabe und die Anwendung zur Verbesserung der Objekterkennung zu verwenden. Basierend auf der Bestimmung des freien Bodens vor dem Roboter, der für die sichere Navigation notwendig ist, zeigen wir, dass dadurch die Lokalisierung, das Erkennung von Flächen und die Kategorisierung von Objekten vereinfacht werden. Wir zeigen ferner, dass Objektklassen effizient aus 3D-Web-Daten gelernt werden können, wenn das Lernen virtuelle 2,5D-Ansichten verwendet, um die Sicht der Sensoren des Roboters auf die reale Welt anzunehmen. Mit diesem Ansatz wurden 200 Objektklassen (zu finden unter www.3d-net.org) modelliert und in Szenen erkannt, z. B. Stühle mit einer Erkennungsrate von 93 Prozent.
Similar content being viewed by others
Literatur
Aldoma, A., Blodow, N., Gossow, D., Gedikli, S., Rusu, R. B.,Vincze, M., Bradski, G. (2011): Cad-model recognition and 6dof pose estimation using 3d cues. In: 3rd IEEE Workshop on 3D Representation and Recognition at IROS, 2011
Aldoma, A., Vincze, M. (2011): Towards 0-th order affordances through cad-model recognition and 6dof pose estimation. In: IROS Workshop Active Semantic Perception and Object Search in the Real World
Arras, K. O., Castellanos, J. A., Schilt, M., Siegwart, R. (2003): Feature-based multi-hypothesis localization and tracking using geometric constraints. Robot Auton Syst, 1 (44): 41–53
Buchaca, A. A., Vincze, M. (2011): Pose alignment for 3d models and single view stereo point clouds based on stable planes. In: Proceedings of the Joint 3DIM/3DPVT Conference 3D Imaging, Modeling, Processing, Visualization, Transmission (3DIMPVT 2011)
Einramhof, P., Vincze, M. (2010): Stereo-based real-time scene segmentation for a home robot. In: International Symposium ELMAR
Elinas, P., Little, J. J. (2005): omcl: Monte-carlo localization for mobile robots with stereo vision. In: Proceedings of Robotics: Science and Systems, pp 373–380, Cambridge, MA, USA
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D. (2010): Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. (PAMI), 32 (9): 1627–1645
Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A., Dobkin, D., Jacobs. D. (2003): A search engine for 3d models. ACM Trans. Graphic., 22: 83–105
Golovinskiy, A., Kim, V. G., Funkhouser, T. (2009): Shape-based recognition of 3d point clouds in urban environments. ICCV
Helmer, S., Lowe, D. (2010): Using stereo for object recognition. ICRA
Helmer, S., Meger, D., Muja, M., Little, J. J., Lowe, D. G. (2010): Multiple viewpoint recognition and localization. In: Asian Computer Vision Conference ACCV
Humenberger, C., Zinner, C., Weber, M., Kubinger, W., Vincze, M. (2010): A fast stereo matching algorithm suitable for embedded real-time systems. Comput. Vis. Image Und., 114: 1180–1202
Itti, L., Koch, C., Niebur, E. (1998): A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal., 20 (11): 1254–1259
Johnson, A. E., Hebert, M. (1999): Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Trans. Pattern Anal. 21 (5): 433–449
Kazhdan, M., Funkhouser, T., Rusinkiewicz, S. (2003): Rotation invariant spherical harmonic representation of 3d shape descriptors. SGP, 156–164
Lai, K., Bo, L., Ren, X., Fox, D. (2011): Sparse distance learning for object recognition combining rgb and depth information. In: ICRA
Land, M. F., Hayhoe, M. M. (2001): In what ways do eye movements contribute to everyday activities? Vision Res., 41: 3559–3565
Mataric, M. J. (2002): Situated robotics. In: Encyclopaedia of Cognitive Science. Nature Publishing Group, Macmillan Reference Ltd
Meger, D., Muja, M., Helmer, S., Gupta, A., Gamroth, C., Hoffman, T., Baumann, M., Southey, T., Fazli, P., Wohlkinger, W. Viswanathan, P., Little, J. J., Lowe, D. G., Orwell, J. (2010): Curious george: an integrated visual search platform. CRV
Nüchter, A., Surmann, H., Hertzberg, J. (2004): Automatic classification of objects in 3d laser range scans. IAS, 963–970
Olufs, S., Vincze, M. (2009): An efficient area-based observation model for monte-carlo robot localization. In: International Conference on Intelligent Robots and Systems IROS 2009, St. Louis, USA
Olufs, S., Vincze, M. (2011): Robust single view room structure segmentation in manhattan-like environments from stereo vision. In: Proceedings of 2011 IEEE International Conference on Robotics and Automation
Pfeifer, R., Lungarella, M., Iida, F. (2007): Self-organization, embodiment, and biologically inspired robotics. Science, 318: 1088–1093
Potapova, E., Zillich, M., Vincze, M. (2011): Learning what matters: combining probabilistic models of 2d and 3d saliency cues. In: International Conference on Computer Vision Systems ICVS, pp. 132–142
Pylyshyn, Z. (2001): Visual indexes, preconceptual objects, and situated vision. Cognition, 80: 127–158
Rusu, R. B., Blodow, N., Marton, Z.-C., Beetz, M. (2009): Close-range scene segmentation and reconstruction of 3d point cloud maps for mobile manipulation in domestic environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems
Rusu, R. B., Bradski, G., Thibaux, R., Hsu, J. (2010): Fast 3d recognition and pose using the viewpoint feature histogram. In: Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, 2155–2162, October 2010
Schlemmer, M. J. (2009): Getting Past Passive Vision – On the Use of an Ontology for Situated Perception in Robots. PhD thesis, Vienna University of Technology
Sudowe, P., Leibe, B. (2011): Efficient use of geometric constraints for sliding-window detection in video. In: ICVS International Conference on Vision Systems, LNCS, Springer Verlag
Sutton, M., Stark, L., Bowyer, K. (1998): Function from visual analysis and physical interaction: a methodology for recognition of generic classes of objects. Image Vision Comput., 16: 746–763
Swadzba, A., Wachsmuth, S. (2010): Indoor scene classification using combined 3d and gist features. In: Asian Conference on Computer Vision, vol. 2, 725–739, Queenstown, New Zealand
Thrun, S., Fox, D., Burgard, W., Dellaert, F. (2000): Robust monte carlo localization for mobile robots. Artif. Intell., 128 (1–2): 99–141
Tombari, F., Salti, S., Stefano, L. D. (2010): Unique signatures of histograms for local surface description. In: 11th European Conference on Computer Vision
Viswanathan, P., Meger, D., Southey, T., Little, J. J., Mackworth, A. (2009): Automated spatial-semantic modeling with applications to place labeling and informed search. CRV
Wohlkinger, W., Vincze, M. (2010): 3d object classification for mobile robots in home-environments using web-data. In: IEEE International Workshop on Robotics in Alpe-Adria-Danube Region RAAD
Wohlkinger, W., Vincze, M. (2011): Shape-based depth image to 3d model matching and classification with inter-view similarity. In: submitted to IEEE IROS
Zhang, J., Huang, K., Yu, Y., Tan, T. (2011): Boosted local structured hog-lbp for object localization. In: Computer Vision and Pattern Recognition (CVPR)
Zhou, K., Richtsfeld, A., Zillich, M., Vincze, M. (2011): Coherent spatial abstraction and stereo line detection for robotic visual attention. In: Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011)
Author information
Authors and Affiliations
Additional information
M. Zillich: The research leading to these results has received funding from the European Community for projects robots@home (IST-6-043450), CogX (IST-7-215181) and HOBBIT (IST-7-288246).
Rights and permissions
About this article
Cite this article
Vincze, M., Wohlkinger, W., Aldoma, A. et al. Situiertes Sehen für bessere Erkennung von Objekten und Objektklassen. Elektrotech. Inftech. 129, 42–52 (2012). https://doi.org/10.1007/s00502-012-0072-6
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00502-012-0072-6