Skip to main content

Object Discovery Using CNN Features in Egocentric Videos

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9117))

Abstract

Lifelogging devices based on photo/video are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this paper, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. The egocentric video sequence acquired by the camera, uses both the appearance extracted by means of a deep convolutional neural network and an object refill methodology that allow to discover objects even in case of small amount of object appearance in the collection of images. We validate our method on a sequence of 1000 egocentric daily images and obtain results with an F-measure of 0.5, 0.17 better than the state of the art approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Refilling the space with more samples of the same class can form a more compact and clear cluster.

  2. 2.

    On any case, the refilled samples, which were already labeled, can only get their labels changed if they did not belong to the initial selection set (40 %).

References

  1. Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: a retrospective memory aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Michael, K.: Wearable computers challenge human rights. ABC Science (2013)

    Google Scholar 

  3. Schulter, S., Leistner, C., Roth, P., Bischof, H.: Unsupervised object discovery and segmentation in videos. In: Proceedings of the British Machine Vision Conference, pp. 53.1–53.12. BMVA Press (2013)

    Google Scholar 

  4. Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1605–1614. IEEE (2006)

    Google Scholar 

  5. Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images In: Tenth International Conference on Computer Vision, ICCV, vol. 1, pp. 370–377. IEEE (2005)

    Google Scholar 

  6. Liu, D., Chen, T.: Unsupervised image categorization and object localization using topic models and correspondences between images. In: 11th International Conference on Computer Vision, ICCV, pp. 1–7. IEEE (2007)

    Google Scholar 

  7. Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: Conference on CVPR, pp. 1346–1353. IEEE (2012)

    Google Scholar 

  8. Lee, Y.J., Grauman, K.: Object-graphs for context-aware visual category discovery. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 346–358 (2012)

    Article  Google Scholar 

  9. Honglak, L., Roger, G., Rajesh, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Computer Science Department, Stanford University, Stanford (2009)

    Google Scholar 

  10. Honglak, L., Yan, L., Rajesh, R., Peter, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. Computer Science Department, Stanford University, Stanford (2009)

    Google Scholar 

  11. Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S.: Vinay Shet: Multi-digit Number Recognition from Street View Imagery Using Deep Convolutional Neural Networks. Google Inc., Mountain View (2014)

    Google Scholar 

  12. Moghimi, M., Azagra, P., Montesano, L., Murillo, A.C., Belongie, S.: Experiments on an RGB-D wearable vision system for egocentric activity recognition. In: 3rd Workshop on Egocentric (First-person) Vision, CVPR (2014)

    Google Scholar 

  13. Lee, Y.J., Grauman, K.: Learning the easy things first: self-paced visual category discovery. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1721–1728. IEEE (2011)

    Google Scholar 

  14. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  15. Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/

  16. Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 73–80. IEEE (2010)

    Google Scholar 

  17. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE (2006)

    Google Scholar 

  18. Tu, Z.: Auto-context and its application to high-level vision tasks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8. IEEE (2008)

    Google Scholar 

  19. Bolaños, M., Garolera, M., Radeva, P.: Active labeling application applied to food-related object recognition. In: Proceedings of the 5th International Workshop on Multimedia for Cooking & Eating Activities, ACM Multiedia International Conference, pp. 45–50 (2013)

    Google Scholar 

  20. Bolaños, M., Garolera, M., Radeva, P.: Video segmentation of life-logging videos. In: Perales, F.J., Santos-Victor, J. (eds.) AMDO 2014. LNCS, vol. 8563, pp. 1–9. Springer, Heidelberg (2014)

    Google Scholar 

  21. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Bolaños .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bolaños, M., Garolera, M., Radeva, P. (2015). Object Discovery Using CNN Features in Egocentric Videos. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19390-8_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19389-2

  • Online ISBN: 978-3-319-19390-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics