Abstract
Lifelogging devices based on photo/video are spreading faster everyday. This growth can represent great benefits to develop methods for extraction of meaningful information about the user wearing the device and his/her environment. In this paper, we propose a semi-supervised strategy for easily discovering objects relevant to the person wearing a first-person camera. The egocentric video sequence acquired by the camera, uses both the appearance extracted by means of a deep convolutional neural network and an object refill methodology that allow to discover objects even in case of small amount of object appearance in the collection of images. We validate our method on a sequence of 1000 egocentric daily images and obtain results with an F-measure of 0.5, 0.17 better than the state of the art approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Refilling the space with more samples of the same class can form a more compact and clear cluster.
- 2.
On any case, the refilled samples, which were already labeled, can only get their labels changed if they did not belong to the initial selection set (40 %).
References
Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: a retrospective memory aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006)
Michael, K.: Wearable computers challenge human rights. ABC Science (2013)
Schulter, S., Leistner, C., Roth, P., Bischof, H.: Unsupervised object discovery and segmentation in videos. In: Proceedings of the British Machine Vision Conference, pp. 53.1–53.12. BMVA Press (2013)
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1605–1614. IEEE (2006)
Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images In: Tenth International Conference on Computer Vision, ICCV, vol. 1, pp. 370–377. IEEE (2005)
Liu, D., Chen, T.: Unsupervised image categorization and object localization using topic models and correspondences between images. In: 11th International Conference on Computer Vision, ICCV, pp. 1–7. IEEE (2007)
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: Conference on CVPR, pp. 1346–1353. IEEE (2012)
Lee, Y.J., Grauman, K.: Object-graphs for context-aware visual category discovery. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 346–358 (2012)
Honglak, L., Roger, G., Rajesh, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Computer Science Department, Stanford University, Stanford (2009)
Honglak, L., Yan, L., Rajesh, R., Peter, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. Computer Science Department, Stanford University, Stanford (2009)
Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S.: Vinay Shet: Multi-digit Number Recognition from Street View Imagery Using Deep Convolutional Neural Networks. Google Inc., Mountain View (2014)
Moghimi, M., Azagra, P., Montesano, L., Murillo, A.C., Belongie, S.: Experiments on an RGB-D wearable vision system for egocentric activity recognition. In: 3rd Workshop on Egocentric (First-person) Vision, CVPR (2014)
Lee, Y.J., Grauman, K.: Learning the easy things first: self-paced visual category discovery. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1721–1728. IEEE (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding (2013). http://caffe.berkeleyvision.org/
Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 73–80. IEEE (2010)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2169–2178. IEEE (2006)
Tu, Z.: Auto-context and its application to high-level vision tasks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–8. IEEE (2008)
Bolaños, M., Garolera, M., Radeva, P.: Active labeling application applied to food-related object recognition. In: Proceedings of the 5th International Workshop on Multimedia for Cooking & Eating Activities, ACM Multiedia International Conference, pp. 45–50 (2013)
Bolaños, M., Garolera, M., Radeva, P.: Video segmentation of life-logging videos. In: Perales, F.J., Santos-Victor, J. (eds.) AMDO 2014. LNCS, vol. 8563, pp. 1–9. Springer, Heidelberg (2014)
Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bolaños, M., Garolera, M., Radeva, P. (2015). Object Discovery Using CNN Features in Egocentric Videos. In: Paredes, R., Cardoso, J., Pardo, X. (eds) Pattern Recognition and Image Analysis. IbPRIA 2015. Lecture Notes in Computer Science(), vol 9117. Springer, Cham. https://doi.org/10.1007/978-3-319-19390-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-19390-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19389-2
Online ISBN: 978-3-319-19390-8
eBook Packages: Computer ScienceComputer Science (R0)