Object Triggered Egocentric Video Summarization

Jain, Samriddhi; Rameshan, Renu M.; Nigam, Aditya

doi:10.1007/978-3-319-64698-5_36

Samriddhi Jain¹⁶,
Renu M. Rameshan¹⁶ &
Aditya Nigam¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10425))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

1794 Accesses
1 Citations

Abstract

Egocentric videos are usually of long duration and contains lot of redundancy which makes summarization an essential task for such videos. In this work we are targeting object triggered egocentric video summarization which aims at extracting all the occurrences of an object in a given video, in near real time. We propose a modular pipeline which first aims at limiting the redundant information and then uses a Convolutional Neural Network and LSTM based approach for object detection. Following this we represent the video as a dictionary which captures the semantic information in the video. Matching a query object reduces to doing an And-Or Tree traversal followed by deepmatching algorithm for fine grained matching. The frames containing the object, which would have been missed at the pruning stage are retrieved by running a tracker on the frames selected by the pipeline mentioned. The modular pipeline allows replacing any module with its more efficient version. Performance tests ran on the overall pipeline for egocentric datasets, EDUB dataset and personal recorded videos, give an average recall of 0.76.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

goPro Camera. https://gopro.com/
Bolaños, M., Radeva, P.: Ego-object discovery. CoRR abs/1504.01639 (2015). http://arxiv.org/abs/1504.01639
Corso, J.J., Alahi, A., Grauman, K., Hager, G.D., Morency, L.P., Sawhney, H., Sheikh, Y.: Video analysis for body-worn cameras in law enforcement. arXiv preprint (2016). arXiv:1604.03130
Fathi, A., Li, Y., Rehg, J.M.: Learning to recognize daily actions using gaze. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 314–327. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33718-5_23
Chapter Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). http://doi.acm.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Girshick, R.B.: Fast R-CNN. CoRR abs/1504.08083 (2015). http://arxiv.org/abs/1504.08083
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.: Struck: structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2016)
Article Google Scholar
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR, vol. 2, p. 7 (2012)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint (2013). arXiv:1301.3781
Nebot, A., Binefa, X., de Mántaras, R.L.: Artificial intelligence research and development. In: Proceedings of the 19th International Conference of the Catalan Association for Artificial Intelligence, vol. 288, Barcelona, Catalonia, Spain. IOS Press, 19–21 October 2016
Google Scholar
Nilsson, N.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann Series in Arti. Morgan Kaufmann Publishers, Burlington (1998). https://books.google.co.in/books?id=LIXBRwkibdEC
Google Scholar
Pech-Pacheco, J.L., Cristóbal, G., Chamorro-Martinez, J., Fernández-Valdivia, J.: Diatom autofocusing in brightfield microscopy: a comparative study. In: 15th International Conference on Pattern Recognition Proceedings, vol. 3, pp. 314–317. IEEE (2000)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. CoRR abs/1612.08242 (2016). http://arxiv.org/abs/1612.08242
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. CoRR abs/1506.01497 (2015). http://arxiv.org/abs/1506.01497
Revaud, J., Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Deepmatching: hierarchical deformable dense matching. Int. J. Comput. Vision 120, 1–24 (2015)
MathSciNet Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Schwarz, H., Marpe, D., Wiegand, T.: Overview of the scalable video coding extension of the h. 264/avc standard. IEEE Trans. Circuits Syst. Video Technol. 17(9), 1103–1120 (2007)
Article Google Scholar
Sivic, J., Schaffalitzky, F., Zisserman, A.: Efficient object retrieval from videos. In: 2004 12th European Signal Processing Conference, pp. 1737–1740, September 2004
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2016)
Google Scholar
Xiong, B., Kim, G., Sigal, L.: Storyline representation of egocentric videos with an applications to story-based search. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV 2015, pp. 4525–4533 (2015). doi:10.1109/ICCV.2015.514
Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: Proceedings CVPR (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India
Samriddhi Jain, Renu M. Rameshan & Aditya Nigam

Authors

Samriddhi Jain
View author publications
You can also search for this author in PubMed Google Scholar
Renu M. Rameshan
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Nigam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samriddhi Jain .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Michael Felsberg
Lund University, Lund, Sweden
Anders Heyden
University of Southern Denmark, Odense, Denmark
Norbert Krüger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, S., Rameshan, R.M., Nigam, A. (2017). Object Triggered Egocentric Video Summarization. In: Felsberg, M., Heyden, A., Krüger, N. (eds) Computer Analysis of Images and Patterns. CAIP 2017. Lecture Notes in Computer Science(), vol 10425. Springer, Cham. https://doi.org/10.1007/978-3-319-64698-5_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-64698-5_36
Published: 28 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64697-8
Online ISBN: 978-3-319-64698-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics