Abstract
Egocentric videos are usually of long duration and contains lot of redundancy which makes summarization an essential task for such videos. In this work we are targeting object triggered egocentric video summarization which aims at extracting all the occurrences of an object in a given video, in near real time. We propose a modular pipeline which first aims at limiting the redundant information and then uses a Convolutional Neural Network and LSTM based approach for object detection. Following this we represent the video as a dictionary which captures the semantic information in the video. Matching a query object reduces to doing an And-Or Tree traversal followed by deepmatching algorithm for fine grained matching. The frames containing the object, which would have been missed at the pruning stage are retrieved by running a tracker on the frames selected by the pipeline mentioned. The modular pipeline allows replacing any module with its more efficient version. Performance tests ran on the overall pipeline for egocentric datasets, EDUB dataset and personal recorded videos, give an average recall of 0.76.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
goPro Camera. https://gopro.com/
Bolaños, M., Radeva, P.: Ego-object discovery. CoRR abs/1504.01639 (2015). http://arxiv.org/abs/1504.01639
Corso, J.J., Alahi, A., Grauman, K., Hager, G.D., Morency, L.P., Sawhney, H., Sheikh, Y.: Video analysis for body-worn cameras in law enforcement. arXiv preprint (2016). arXiv:1604.03130
Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33718-5_23
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). http://doi.acm.org/10.1145/358669.358692
Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015). http://arxiv.org/abs/1504.08083
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR, vol. 2, p. 7 (2012)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781
Nebot, A., Binefa, X., de Mántaras, R.L.: Artificial intelligence research and development. In: Proceedings of the 19th International Conference of the Catalan Association for Artificial Intelligence, vol. 288, Barcelona, Catalonia, Spain. IOS Press, 19–21 October 2016
Nilsson, N.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann Series in Arti. Morgan Kaufmann Publishers, Burlington (1998). https://books.google.co.in/books?id=LIXBRwkibdEC
Pech-Pacheco, J.L., Cristóbal, G., Chamorro-Martinez, J., Fernández-Valdivia, J.: Diatom autofocusing in brightfield microscopy: a comparative study. In: 15th International Conference on Pattern Recognition Proceedings, vol. 3, pp. 314–317. IEEE (2000)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016). http://arxiv.org/abs/1612.08242
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497
Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatching: hierarchical deformable dense matching. Int. J. Comput. Vision 120, 1–24 (2015)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable video coding extension of the h. 264/avc standard. IEEE Trans. Circuits Syst. Video Technol. 17(9), 1103–1120 (2007)
Sivic, J., Schaffalitzky, F., Zisserman, A.: Efficient object retrieval from videos. In: 2004 12th European Signal Processing Conference, pp. 1737–1740, September 2004
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2016)
Xiong, B., Kim, G., Sigal, L.: Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV 2015, pp. 4525–4533 (2015). doi:10.1109/ICCV.2015.514
Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings CVPR (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Jain, S., Rameshan, R.M., Nigam, A. (2017). Object Triggered Egocentric Video Summarization. In: Felsberg, M., Heyden, A., Krüger, N. (eds) Computer Analysis of Images and Patterns. CAIP 2017. Lecture Notes in Computer Science(), vol 10425. Springer, Cham. https://doi.org/10.1007/978-3-319-64698-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-64698-5_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64697-8
Online ISBN: 978-3-319-64698-5
eBook Packages: Computer ScienceComputer Science (R0)