Skip to main content
Log in

Web-enhanced object category learning for domestic robots

  • Original Research Paper
  • Published:
Intelligent Service Robotics Aims and scope Submit manuscript

Abstract

We present a system architecture for domestic robots that allows them to learn object categories after one sample object was initially learned. We explore the situation in which a human teaches a robot a novel object, and the robot enhances such learning by using a large amount of image data from the Internet. The main goal of this research is to provide a robot with capabilities to enhance its learning while minimizing time and effort required for a human to train a robot. Our active learning approach consists of learning the object name using speech interface, and creating a visual object model by using a depth-based attention model adapted to the robot’s personal space. Given the object’s name (keyword), a large amount of object-related images from two main image sources (Google Images and the LabelMe website) are collected. We deal with the problem of separating good training samples from noisy images by performing two steps: (1) Similar image selection using a Simile Selector Classifier, and (2) non-real image filtering by implementing a variant of Gaussian Discriminant Analysis. After web image selection, object category classifiers are then trained and tested using different objects of the same category. Our experiments demonstrate the effectiveness of our robot learning approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Agarwal S, Awan A, Roth D (2004) Learning to detect objects in images via a sparse, part-based representation. IEEE PAMI 20(11):1475–1490

    Article  Google Scholar 

  2. Leibe B, Leonardis A, Schiele B (2004) Combined object categorization and segmentation with an implicit shape model. In: Workshop on statistical learning in computer vision, ECCV

  3. Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the CVPR, pp 511–518

  4. Opelt A, Fessenegger A, Auer P (2004) Weak hypotheses and boosting for generic object detection and recognition. In: ECCV

  5. Thomaz AL, Cakmak M (2009) Learning about objects with human teachers. In: Proceedings of the international conference on human robot interaction (HRI)

  6. Fergus R, Fei-Fei L, Perona P, Zisserman A (2005) Learning object categories from Google’s Image search. ICCV 2

  7. Vijayanarasimhan S, Grauman K (2008) Keywords to visual categories: multiple-instance learning for weakly supervised object categorization. In: CVPR

  8. Vijayanarasimhan S, Grauman K (2011) Large-scale live active learning: training object detectos with crawled data and crowds. In: CVPR

  9. Tsai D, Jing Y, Liu Y, Rowley H, Ioffe S, Rehg JM (2011) Large-scale image annotation using visual synset. In: ICCV

  10. Li L-J, Fei-Fei L (2007) Optimol: automatic object picture collection via incremental model learning. In: CVPR

  11. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173

    Article  Google Scholar 

  12. Breazeal C, Thomaz AL (2008) Learning from human teachers with socially guided exploration. In: Proceedings of the international conference on robots and automation (ICRA)

  13. Vogel A, Raghunathan K, Jurafsky D (2005) Dialog with robots. In: AAAI

  14. Mansur A, Sakata K, Rukhsana T, Kobayashi Y, Kuno Y (2008) Human robot interaction through simple expressions for object recognition. The 17th IEEE international symposium on robot and human interactive communication, RO-MAM

  15. Cao L, Kobayashi Y, Kuno Y (2009) Spatial relation model for object recognition in human-robot interaction. In: Proceedings of the 5th international conference on Emerging intelligent computing technology and applications, ICIC

  16. Microsoft Speech Application Programming Interface (API) and SDK, Version 5.1, Microsoft Corporation, http://www.microsoft.com/speech

  17. Drummond C, Holte R (2003) Class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on international conference on machine learning, ICML

  18. Ha TM, Bunke H (1997) Off-line, handwritten numeral recognition by perturbation method. IEEE Trans Pattern Anal Mach Intell 19(5):535–539

    Google Scholar 

  19. Itti L, Koch C (2001) Computational modeling of visual attention. Nat Rev: Neurosci 2:194–203

    Article  Google Scholar 

  20. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  21. Sun Y, Fisher R (2003) Object-based visual attention for computer vision. Artif Intell 146:77–123

    Google Scholar 

  22. Frintrop S (2006) VOCUS: a visual attention system for object detection and goal-directed search.Springer, Heidelberg, vol 3899. LNAI 3–540-32759-2

  23. Hall ET (1966) The hidden dimension. Anchor Books, New York

    Google Scholar 

  24. Microsoft Knect for Windows SDK BETA from Microsoft Research, http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk

  25. Shotton J, Fitzgibbon AW, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: CVPR, pp 1297–1304

  26. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  27. Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1-3):259–289

    Google Scholar 

  28. Gall J, Lempitsky V (2009) Class-specific hough forests for object detection. IEEE conference on computer vision and pattern recognition, pp 1022–1029

Download references

Acknowledgments

This work was supported in part by Grant-in-Aid for Scientific Research (C) 23500242.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian I Penaloza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Penaloza, C.I., Mae, Y., Ohara, K. et al. Web-enhanced object category learning for domestic robots. Intel Serv Robotics 6, 53–67 (2013). https://doi.org/10.1007/s11370-012-0126-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11370-012-0126-y

Keywords

Navigation