Skip to main content
Log in

Event-Driven Video Abstraction and Visualization

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose a new video summarization procedure that produces a dynamic (video) abstract of the original video sequence. Our technique compactly summarizes a video data by preserving its original temporal characteristics (visual activity) and semantically essential information. It relies on an adaptive nonlinear sampling. The local sampling rate is directly proportional to the amount of visual activity in localized sub-shot units of the video. To get very short, yet semantically meaningful summaries, we also present an event-oriented abstraction scheme, in which two semantic events; emotional dialogue and violent action, are characterized and abstracted into the video summary before all other events. If the length of the summary permits, other non key events are then added. The resulting video abstract is highly compact.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. F. Arman, R. Depommier, A. Hsu, and M.-Y. Chiu, “Content-based browsing of video sequences,” in Proc. ACM Multimedia, Oct. 1994, pp. 97–103.

  2. D. Bordwell and K. Thompson, Film Art: An Introduction, McGraw-Hill: New York, 1997.

    Google Scholar 

  3. J.R. Deller, J.G. Proakis, and J.H.L. Hansen, Discrete-Time Processing of Speech Signals, Prentice-Hall: New Jersey, 1993.

    Google Scholar 

  4. D. DeMenthon, V. Kobla, and D. Doermann, “Video summarization by curve simplification,” in Proc. ACM Multimedia, 1998, pp. 211–218.

  5. W. Ding, G. Marchionini, and T. Tse, “Previewing video data: Browsing key frames at high rates using a video slide show interface,” in International Symposium on Research, Development & Practice in Digital Libraries, 1997.

  6. G.H. Golub and C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press: Baltimore, MD, 1996.

    Google Scholar 

  7. B.F. Kawin, How Movies Work, University of California Press, Ltd.: London, England, 1992.

    Google Scholar 

  8. A. Komlodi and G. Marchionini, “Key frame preview techniques for video browsing,” in Proc. of ACM Digital Libraries, 1998, pp. 118–125.

  9. H. Levin and W. Lord, “Speech pitch frequency as an emotional state indicator,” IEEE Trans. Systems, Man and Cybernetics, pp. 259-273, 1975.

  10. R. Lienhart, S. Pfeiffer, and W. Effelsberg, “Video abstracting,” Communications of The ACM, pp. 55-62, 1997.

  11. S. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 2, No. 7, pp. 674-693, 1989.

    Google Scholar 

  12. M. Mills, J. Cohen, and Y.Y. Wong, “A magnifier tool for video data,” in Proc. ACM Computer Human Interface (CHI), May 1992, pp. 93–98.

  13. I.R. Murray and J.L. Arnott, “Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion,” Journal of Acoustical Society of America, Vol. 93, No. 2, 1993.

  14. J. Nam, M. Alghoniemy, and A.H. Tewfik, “Audio-visual content-based violent scene characterization,” in Proc. IEEE Int. Conf. Image Processing, Vol. 1, 1198, pp. 353–357.

    Google Scholar 

  15. J. Nam, A.E. Ç etin, and A.H. Tewfik, “Speaker identification and video analysis for hierarchical video shot classification,” in Proc. IEEE Int. Conf. Image Processing, Vol. 2, 1997, pp. 550–553.

    Google Scholar 

  16. J. Nam and A.H. Tewfik, “Combined audio and visual streams analysis for video sequence segmentation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 4, 1997, pp. 2665–2668.

    Google Scholar 

  17. J. Nam and A.H. Tewfik, “Progressive resolution motion indexing of video object,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 4, 1998, pp. 3701–3704.

    Google Scholar 

  18. S. Pfeiffer, S. Fischer, and W. Effelsberg, “Automatic audio content analysis,” in Proc. ACM Multimedia, Nov. 1996, pp. 21–30.

  19. R.W. Picard, Affective Computing, The MIT Press: Cambridge, Massachusetts, London, England, 1997.

    Google Scholar 

  20. D. Ponceleon, S. Srinivasan, A. Amir, D. Petkovic, and D. Diklic, “Key to effective video retrieval: Effective cataloging and browsing,” in Proc. ACM Multimedia, Sept. 1998, pp. 99–107.

  21. L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall: Englewood Cliffs, New Jersey, 1993.

    Google Scholar 

  22. J. Saunders, “Real-Time discrimination of broadcast speech/music,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 2, 1996, pp. 993–996.

    Google Scholar 

  23. E. Scheirer and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 2, 1997, pp. 1331–1334.

    Google Scholar 

  24. D. Sinha and A.H. Tewfik, “Low bit rate transparent audio compression using adapted wavelets,” IEEE Trans. Signal Processing, Vol. 41, No. 12, 1993.

  25. M.A. Smith and T. Kanade, “Video skimming and characterization through the combination of image and language understanding techniques,” in Proc. of IEEE CVPR' 97, 1997, pp. 775–781.

  26. S.W. Smoliar and H. Zhang, “Content-based video indexing and retrieval,” IEEE Multimedia Magazine, Vol. 1, No. 2, pp. 62-72, 1994.

    Google Scholar 

  27. Y. Tonomura, A. Akutsu, Y. Taniguchi, and G. Suzuki, “Structured video computing,” IEEE Multimedia Magazine, Vol. 1, No. 3, pp. 34-43, 1994.

    Google Scholar 

  28. W. Wolf, “Key frame selection by motion analysis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 2, 1996, pp. 1228–1231.

    Google Scholar 

  29. Y. Xu, J.B. Weaver D.M. Healy, and J. Lu, “Wavelet transform domain filters: A spatially selective noise filtration technique,” IEEE Trans. Image Processing, Vol. 3, No. 6, pp. 747-758, 1994.

    Google Scholar 

  30. B.L. Yeo and B. Liu, “Rapid scene analysis on compressed videos,” IEEE Trans. Circuits and Systems For Video Technology, Vol. 5, No. 6, pp. 533-544, 1995.

    Google Scholar 

  31. M.M. Yeung and B.L. Yeo, “Time-constrained clustering for segmentation of video into story units,” Int. Conf. on Pattern Recognition, Vol. C, Aug. 1996, pp. 375–380.

    Google Scholar 

  32. M.M. Yeung and B.L. Yeo, “Video visualization for compact presentation and fast browsing of pictorial content,” IEEE Trans. Circuits and Systems For Video Technology, Vol. 7, No. 5, pp. 771-785, Oct. 1997.ss

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nam, J., Tewfik, A.H. Event-Driven Video Abstraction and Visualization. Multimedia Tools and Applications 16, 55–77 (2002). https://doi.org/10.1023/A:1013241718521

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1013241718521

Navigation