skip to main content
10.1145/2964284.2971474acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

First Person View Video Summarization Subject to the User Needs

Authors Info & Claims
Published:01 October 2016Publication History

ABSTRACT

Our life is becoming heavily documented and expressed on the digital substrate. This booming flow of consumer video has lead to an increasing demand of multimedia analysis tools to organize and summarize those visual memories. Due to the personal nature of such videos, though, the summarization needs to be adapted to the user needs and preferences. Yet, most summarization systems rely solely on pre-defined criteria, e.g. story-coherence or interestingness pre-trained classifiers. I propose a system which is capable of finding relevant digital memories to a given semantic query, and then summarize them on a customized manner. The proposed framework includes a wide set of tools to match a user's needs, from retrieval using multimodal queries to summarization striving to his/her preferences, both provided passively and actively. Preliminary results show the high potential of such a framework, with over 70% retrieval accuracy. More importantly, as seen from the user study, the summaries generated achieve an unprecedented compromise between usability and quality.

References

  1. K. Aizawa, K. Ishijima, and M. Shiina. Summarizing wearable video. In International Conference on Image Processing, volume 3, pages 398--401. IEEE, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  2. V. Chandrasekhar, W. Min, X. Li, C. Tan, B. Mandal, L. Li, and J. H. Lim. Efficient retrieval from large-scale egocentric visual data using a sparse graph representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 527--534, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. G. del Molino, B. Mandal, L. Li, and J. H. Lim. Organizing and retrieving episodic memories from first person view. In International Conference on Multimedia and Expo Workshops, pages 1--6. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. G. del Molino, C. Tan, J. H. Lim, and A. H. Tan. Summarization of egocentric videos: A comprehensive survey. submitted for publication on THMS, 2016.Google ScholarGoogle Scholar
  5. M. Gygli, H. Grabner, H. Riemenschneider, and L. Van Gool. Creating summaries from user videos. In Computer Vision--ECCV, pages 505--520. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Gygli, H. Grabner, and L. Van Gool. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3090--3098, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  7. B. Han, J. Hamm, and J. Sim. Personalized video summarization with human in the loop. In Applications of Computer Vision (WACV), 2011 IEEE Workshop on, pages 51--57. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In Computer Vision and Pattern Recognition, volume 2, page 6, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. J. Lee and K. Grauman. Predicting important objects for egocentric video summarization. International Journal of Computer Vision, pages 1--18, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y.-L. Lin, V. Morariu, and W. Hsu. Summarizing while recording: Context-based highlight detection for egocentric videos. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 51--59, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Lu and K. Grauman. Story-driven summarization for egocentric video. In Computer Vision and Pattern Recognition, pages 2714--2721. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Masumitsu and T. Echigo. Video summarization using reinforcement learning in eigenspace. In Image Processing, 2000. Proceedings. 2000 International Conference on, volume 2, pages 267--270. IEEE, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. G. Money and H. Agius. Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communication and Image Representation, 19(2):121--143, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W.-T. Peng, W.-T. Chu, C.-H. Chang, C.-N. Chou, W.-J. Huang, W.-Y. Chang, and Y.-P. Hung. Editing by viewing: automatic home video summarization by viewing behavior analysis. Multimedia, IEEE Transactions on, 13(3):539--550, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211--252, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Sawahata and K. Aizawa. Wearable imaging system for summarizing personal experiences. In International Conference on Multimedia and Expo, page 45. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Tancharoen, T. Yamasaki, and K. Aizawa. Practical experience recording and indexing of life log video. In Proceedings of the 2nd ACM workshop on Continuous archival and retrieval of personal experiences, pages 61--66. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. L. Tseng and J. R. Smith. Hierarchical video summarization based on context clustering. In ITCom 2003, pages 14--25. International Society for Optics and Photonics, 2003.Google ScholarGoogle Scholar
  21. P. Varini, G. Serra, and R. Cucchiara. Egocentric video summarization of cultural tour based on user preferences. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, pages 931--934. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Xiong, G. Kim, and L. Sigal. Storyline representation of egocentric videos with an applications to story-based search. In Proceedings of the IEEE International Conference on Computer Vision, pages 4525--4533, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Xu, L. Mukherjee, Y. Li, J. Warner, J. M. Rehg, and V. Singh. Gaze-enabled egocentric video summarization via constrained submodular maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2235--2244, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  24. H. Yang, L. Chaisorn, Y. Zhao, S.-Y. Neo, and T.-S. Chua. Videoqa: question answering on news video. In Proceedings of the eleventh ACM international conference on Multimedia, pages 632--641. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Yoshitaka and K. Sawada. Personalized video summarization based on behavior of viewer. In Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on, pages 661--667. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Zhao and E. Xing. Quasi real-time summarization for consumer videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2513--2520, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems, pages 487--495, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. First Person View Video Summarization Subject to the User Needs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader