skip to main content
10.1145/1772690.1772783acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

What are the most eye-catching and ear-catching features in the video?: implications for video summarization

Authors Info & Claims
Published:26 April 2010Publication History

ABSTRACT

Video summarization is a mechanism for generating short summaries of the video to help people quickly make sense of the content of the video before downloading or seeking more detailed information. To produce reliable automatic video summarization algorithms, it is essential to first understand how human beings create video summaries with manual efforts. This paper focuses on a corpus of instructional documentary video, and seeks to improve automatic video summaries by understanding what features in the video catch the eyes and ears of human assessors, and using these findings to inform automatic summarization algorithms. The paper contributes a thorough and valuable methodology for performing automatic video summarization, and the methodology can be extended to inform summarization of other video corpuses.

References

  1. L. Agnihotri, K. Devera, T. McGee, and N. Dimitrove. Summarization of video programs based on closed captions. In Proc. SPIE. Conf. Storage and Retrieval for Media Databases, page 599--607, San Jose, CA, Jan. 2001.Google ScholarGoogle Scholar
  2. B. Arons. Speechskimmer: A system for interactively skimming recorded speech. ACM Transactions onComputer Human Interaction, 4:3--38, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Cabasson and A. Divakaran. Automatic extraction of soccer video highlights using a combination of motion and audio features. In Proceedings of SPIE Conference on Storage and Retrieval for Media Databases 2003, volume 5021, pages 272--276, Santa Clara, CA, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  4. F. R. Chen and M. Withgott. The use of emphasis to automatically summarize a spoken discourse. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, volume 1, pages 229--232 vol.1, 1992.Google ScholarGoogle Scholar
  5. M. Christel, S. Stevens, T. Kanade, M. Mauldin, R. Reddy, and H. Wactlar. Techniques for the creation and exploration of digital video libraries. In Multimedia Tools and Applications, B. Furht, Editor. Kluwer Academic Publishers, 1996.Google ScholarGoogle Scholar
  6. M. G. Christel, M. A. Smith, C. R. Taylor, and D. B. Winkler. Evolving video skims into useful multimedia abstractions. In CHI '98: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 171--178, New York, NY, USA, 1998. ACMPress/Addison-Wesley Publishing Co. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Ekin and A. M. Tekalp. Automatic soccer video analysis and summarization. IEEE Trans. on Image Processing, 12:796--807, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Erol, D.-S. Lee, and J. Hull. Multimodal summarization of meeting recordings. In ICME '03: Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03), pages 25--28, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Farin, W. Effelsberg, and P. H. N. deWith. Robust clustering-based video-summarization with integration of domain-knowledge. In Proc. IEEE Int. Conf. Multimedia and Expo 2002 (ICME'2002), pages 89--92, Lausanne, Switzerland, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  10. A. M. Ferman and A. M. Tekalp. Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans. Multimedia, 5(2):244--256, Jun. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Foote, M. Cooper, and L. Wilcox. Enhanced video browsing using automatically extracted audio excerpts. IEEE, 2000.Google ScholarGoogle Scholar
  12. M.E. Funk and C.A. Reid. Indexing consistency in MEDLINE. Bull Med Libr Assoc. 1983;71:176--183.Google ScholarGoogle Scholar
  13. Y. Gong. Summarizing audiovisual contents of a video program. EURASIP J. Appl. Signal Process., 2003:160--169, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Hanjalic and H. Zhang. An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans. Circuits Syst. Video Technol, 9(8):1280--1289, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Hauptmann, M. Christel, W. Lin, B. Maher, J. Yang, R. Baron, and G. Xiang. Summarizing bbc rushes the informedia way. In TVS '07: Proceedings of the international workshop on TRECVID video summarization, New York, NY, USA, 2007. ACM.Google ScholarGoogle Scholar
  16. L. Kennedy and D. Ellis. Pitch-based emphasis detection for characterization of meeting recordings. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, Vol. 33, No. 1, pages 159--174, Mar. 1977.Google ScholarGoogle Scholar
  18. B. Li, H. Pan, and I. Sezan. A general framework for sports video summarization with its application to soccer. In Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, pages 169--172, Hong Kong, 2003.Google ScholarGoogle Scholar
  19. Y. Li, C. Dorai, and R. Farrell. Creating magic: system for generating learning object metadata for instructional content. In MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia, pages 367--370, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W.-N. Lie and C.-M. Lai. News video summarization based on spatial and motion feature analysis. In Proceedings of the 5th Pacific Rim Conference on Multimedia. Lecture Notes in Computer Science, volume 3332, pages 246--255, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Lienhart, S. Pfeiffer, and W. Effelsberg. Video abstracting. Commun. ACM, 40(12):54--62, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Marchionini, Y. Song, and R. Farrell. Multimedia surrogates for video gisting: Toward combining spoken words and imagery. In Journal of Information & Process Manage. 45(6): 615--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Over, A. F. Smeaton, and P. Kelly. The trecvid 2007 bbc rushes summarization evaluation pilot. In TVS '07: Proceedings of the international workshop on TRECVID video summarization, pages 1--15, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Ratakonda, I. M. Sezan, and R. J. Crinon. Hierarchical video summarization. In Proc. SPIE Conf. Visual Communications and Image Processing, volume 3653, pages 1531--1541, San Jose, CA, Jan. 1999.Google ScholarGoogle Scholar
  25. Y. Song and G. Marchionini. Effects of audio and visual surrogates for making sense of digital video. In CHI '07: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 867--876, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Taskiran, Z. Pizlo, Amir, D. A., Ponceleon, and E. J. Delp. Automated video program summarization using speech transcripts. IEEE Transactions on Multimedia, 8(4):775--791, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Uchihashi, J. Foote, A. Girgensohn, and J. Boreczky. Video manga: Generating semantically meaningful video summaries. In ACM Multimedia'99, pages 383--392. ACM Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. M. Wildemuth, G. Marchionini, T. Wilkens, M. Yang, G. Geisler, B. Fowler, A. Hughes, and X. Mu. (2002). Alternative surrogates for video objects in a digital library: Users' perspectives on their relative usability. In ECDL '02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pages 493--507, London, UK. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Yang and A. G. Hauptmann. Exploring temporal consistency for video analysis and retrieval. In MIR' 06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval, pages 33--42, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. What are the most eye-catching and ear-catching features in the video?: implications for video summarization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '10: Proceedings of the 19th international conference on World wide web
          April 2010
          1407 pages
          ISBN:9781605587998
          DOI:10.1145/1772690

          Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 April 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        ePub

        View this article in ePub.

        View ePub