Abstract
In this paper, we propose a new video summarization procedure that produces a dynamic (video) abstract of the original video sequence. Our technique compactly summarizes a video data by preserving its original temporal characteristics (visual activity) and semantically essential information. It relies on an adaptive nonlinear sampling. The local sampling rate is directly proportional to the amount of visual activity in localized sub-shot units of the video. To get very short, yet semantically meaningful summaries, we also present an event-oriented abstraction scheme, in which two semantic events; emotional dialogue and violent action, are characterized and abstracted into the video summary before all other events. If the length of the summary permits, other non key events are then added. The resulting video abstract is highly compact.
Similar content being viewed by others
References
F. Arman, R. Depommier, A. Hsu, and M.-Y. Chiu, “Content-based browsing of video sequences,” in Proc. ACM Multimedia, Oct. 1994, pp. 97–103.
D. Bordwell and K. Thompson, Film Art: An Introduction, McGraw-Hill: New York, 1997.
J.R. Deller, J.G. Proakis, and J.H.L. Hansen, Discrete-Time Processing of Speech Signals, Prentice-Hall: New Jersey, 1993.
D. DeMenthon, V. Kobla, and D. Doermann, “Video summarization by curve simplification,” in Proc. ACM Multimedia, 1998, pp. 211–218.
W. Ding, G. Marchionini, and T. Tse, “Previewing video data: Browsing key frames at high rates using a video slide show interface,” in International Symposium on Research, Development & Practice in Digital Libraries, 1997.
G.H. Golub and C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press: Baltimore, MD, 1996.
B.F. Kawin, How Movies Work, University of California Press, Ltd.: London, England, 1992.
A. Komlodi and G. Marchionini, “Key frame preview techniques for video browsing,” in Proc. of ACM Digital Libraries, 1998, pp. 118–125.
H. Levin and W. Lord, “Speech pitch frequency as an emotional state indicator,” IEEE Trans. Systems, Man and Cybernetics, pp. 259-273, 1975.
R. Lienhart, S. Pfeiffer, and W. Effelsberg, “Video abstracting,” Communications of The ACM, pp. 55-62, 1997.
S. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 2, No. 7, pp. 674-693, 1989.
M. Mills, J. Cohen, and Y.Y. Wong, “A magnifier tool for video data,” in Proc. ACM Computer Human Interface (CHI), May 1992, pp. 93–98.
I.R. Murray and J.L. Arnott, “Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion,” Journal of Acoustical Society of America, Vol. 93, No. 2, 1993.
J. Nam, M. Alghoniemy, and A.H. Tewfik, “Audio-visual content-based violent scene characterization,” in Proc. IEEE Int. Conf. Image Processing, Vol. 1, 1198, pp. 353–357.
J. Nam, A.E. Ç etin, and A.H. Tewfik, “Speaker identification and video analysis for hierarchical video shot classification,” in Proc. IEEE Int. Conf. Image Processing, Vol. 2, 1997, pp. 550–553.
J. Nam and A.H. Tewfik, “Combined audio and visual streams analysis for video sequence segmentation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 4, 1997, pp. 2665–2668.
J. Nam and A.H. Tewfik, “Progressive resolution motion indexing of video object,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 4, 1998, pp. 3701–3704.
S. Pfeiffer, S. Fischer, and W. Effelsberg, “Automatic audio content analysis,” in Proc. ACM Multimedia, Nov. 1996, pp. 21–30.
R.W. Picard, Affective Computing, The MIT Press: Cambridge, Massachusetts, London, England, 1997.
D. Ponceleon, S. Srinivasan, A. Amir, D. Petkovic, and D. Diklic, “Key to effective video retrieval: Effective cataloging and browsing,” in Proc. ACM Multimedia, Sept. 1998, pp. 99–107.
L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall: Englewood Cliffs, New Jersey, 1993.
J. Saunders, “Real-Time discrimination of broadcast speech/music,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 2, 1996, pp. 993–996.
E. Scheirer and M. Slaney, “Construction and evaluation of a robust multifeature speech/music discriminator,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 2, 1997, pp. 1331–1334.
D. Sinha and A.H. Tewfik, “Low bit rate transparent audio compression using adapted wavelets,” IEEE Trans. Signal Processing, Vol. 41, No. 12, 1993.
M.A. Smith and T. Kanade, “Video skimming and characterization through the combination of image and language understanding techniques,” in Proc. of IEEE CVPR' 97, 1997, pp. 775–781.
S.W. Smoliar and H. Zhang, “Content-based video indexing and retrieval,” IEEE Multimedia Magazine, Vol. 1, No. 2, pp. 62-72, 1994.
Y. Tonomura, A. Akutsu, Y. Taniguchi, and G. Suzuki, “Structured video computing,” IEEE Multimedia Magazine, Vol. 1, No. 3, pp. 34-43, 1994.
W. Wolf, “Key frame selection by motion analysis,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, Vol. 2, 1996, pp. 1228–1231.
Y. Xu, J.B. Weaver D.M. Healy, and J. Lu, “Wavelet transform domain filters: A spatially selective noise filtration technique,” IEEE Trans. Image Processing, Vol. 3, No. 6, pp. 747-758, 1994.
B.L. Yeo and B. Liu, “Rapid scene analysis on compressed videos,” IEEE Trans. Circuits and Systems For Video Technology, Vol. 5, No. 6, pp. 533-544, 1995.
M.M. Yeung and B.L. Yeo, “Time-constrained clustering for segmentation of video into story units,” Int. Conf. on Pattern Recognition, Vol. C, Aug. 1996, pp. 375–380.
M.M. Yeung and B.L. Yeo, “Video visualization for compact presentation and fast browsing of pictorial content,” IEEE Trans. Circuits and Systems For Video Technology, Vol. 7, No. 5, pp. 771-785, Oct. 1997.ss
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Nam, J., Tewfik, A.H. Event-Driven Video Abstraction and Visualization. Multimedia Tools and Applications 16, 55–77 (2002). https://doi.org/10.1023/A:1013241718521
Issue Date:
DOI: https://doi.org/10.1023/A:1013241718521