Abstract
In applications, such as post-production and archiving of audiovisual material, users are confronted with large amounts of redundant unedited raw material, called rushes. Viewing and organizing this material are crucial but time consuming tasks. Typically, multiple but slightly different takes of the same scene can be found in the rushes video. We propose a method for detecting and clustering takes of one scene shot from the same or very similar camera positions. An important subproblem is to determine the similarity of video segments. We propose a distance measure based on the Longest Common Subsequence (LCSS) model. Two variants of the proposed approach, one with a threshold parameter and one with automatically determined threshold, are compared against the Dynamic Time Warping (DTW) distance measure on six videos from the TRECVID 2007 BBC rushes summarization data set. We also evaluate the influence of the applied temporal segmentation method at the input on the results. Applications of the proposed method to automatic skimming and interactive browsing of rushes video are described.
Similar content being viewed by others
References
Adjeroh, D.A., Lee, M.C., King, I.: A distance measure for video sequences. Comput. Vis. Image Underst. 75(1–2), 25–45 (1999)
Bailer, W., Lee, F., Thallinger, G.: Skimming rushes video using retake detection. In: Proceedings of the TRECVID Workshop on Video Summarization (TVS’07), pp. 60–64. Assoc. Comput. Mach., New York (2007)
Bailer, W., Lee, F., Thallinger, G.: Detecting and clustering multiple takes of one scene. In: Proceedings of the 14th International Multimedia Modeling Conference, pp. 80–89. Kyoto, Japan (2008)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chang, S.F., Chen, W., Meng, H.J., Sundaram, H., Zhong, D.: VideoQ: an automated content based video search system using visual cues. In: MULTIMEDIA ’97: Proceedings of the fifth ACM international conference on Multimedia. pp. 313–324. Assoc. Comput. Mach., New York (1997)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
Covell, M., Baluja, S., Fink, M.: Advertisement detection and replacement using acoustic and visual repetition. In: IEEE Workshop on Multimedia Signal Processing, pp. 461–466 (2006)
Delaney, B., Hoomans, B.: Preservation and digitisation plans: overview and analysis, PrestoSpace Deliverable 2.1 user requirements final report. http://www.prestospace.org/project/deliverables/D2-1_User_Requirements_Final_Report.pdf (2004)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience, New York (2000)
Duygulu, P., Pan, J.Y., Forsyth, D.A.: Towards auto-documentary: tracking the evolution of news stories. In: MULTIMEDIA ’04: Proceedings of the 12th annual ACM international conference on Multimedia, pp. 820–827. Assoc. Comput. Mach., New York (2004)
FFmpeg. http://ffmpeg.mplayerhq.hu
Hampapur, A., Bolle, R.M.: Comparison of distance measures for video copy detection. In: IEEE International Conference on Multimedia and Expo, pp. 737–740 (2001)
Hampapur, A., Hyun, K., Bolle, R.M.: Comparison of sequence matching techniques for video copy detection. In: Yeung, M.M., Li, C.S., Lienhart, R.W. (eds.) Storage and Retrieval for Media Databases 2002. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, vol. 4676, pp. 194–201 (2001)
Hsu, W., Chang, S.F.: Topic tracking across broadcast news videos with visual duplicates and semantic concepts. In: International Conference on Image Processing (ICIP) (2006)
Keogh, E.J., Pazzani, M.J.: Derivative dynamic time warping. In: First SIAM International Conference on Data Mining (SDM’2001) (2001)
Kleban, J., Sarkar, A., Moxley, E., Mangiat, S., Joshi, S., Kuo, T., Manjunath, B.S.: Feature fusion and redundancy pruning for rush video summarization. In: TVS ’07: Proceedings of the international workshop on TRECVID video summarization, pp. 84–88. Assoc. Comput. Mach., New York (2007)
MEncoder. http://www.mplayerhq.hu
MPEG-7: Information Technology—Multimedia Content Description Interface: Part 3: Visual. ISO/IEC 15938-3 (2001)
MPEG-7: Information Technology—Multimedia Content Description Interface: Part 8: Extraction and Use of MPEG-7 Descriptions. ISO/IEC 15938-8 (2001)
Myers, C.S., Rabiner, L.R.: A comparative study of several dynamic time-warping algorithms for connected word recognition. Bell Syst. Techn. J. 60(7), 1389–1409 (1981)
Open source computer vision library. http://sourceforge.net/projects/opencvlibrary
Over, P., Smeaton, A.F., Kelly, P.: The TRECVID 2007 BBC rushes summarization evaluation pilot. In: Proceedings of the TRECVID Workshop on Video Summarization (TVS’07), pp. 1–15. Assoc. Comput. Mach., New York (2007)
Salvador, S., Chan, P.: FastDTW: Toward accurate dynamic time warping in linear time and space. In: Proceedings of 3rd Workshop on Mining Temporal and Sequential Data at ACM KDD’04 (2004)
Smeaton, A.F., Over, P.: TRECVID 2006: Shot boundary detection task overview. In: Proceedings of the TRECVID Workshop (2006)
SoX—Sound eXchange. http://sox.sourceforge.net
Tan, Y.P., Kulkarni, S.R., Ramadge, P.J.: A framework for measuring video similarity and its application to video query by example. In: Proceedings of International Conference on Image Processing, vol. 2, pp. 106–110. Kobe (1999)
Viola, P., Jones, M.: Fast and robust classification using asymmetric adaboost and a detector cascade. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge (2002)
Vlachos, M., Kollios, G., Gunopoulos, D.: Discovering similar multidimensional trajectories. In: ICDE ’02: Proceedings of the 18th International Conference on Data Engineering, pp. 673–684. IEEE Comput. Soc., San Jose (2002)
Vlachos, M., Kollios, G., Gunopulos, D.: Elastic translation invariant matching of trajectories. Mach. Learn. 58(2–3), 301–334 (2005)
Zhang, Z., Huang, K., Tan, T.: Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes. In: ICPR ’06: Proceedings of the 18th International Conference on Pattern Recognition, pp. 1135–1138. IEEE Comput. Soc., Washington (2006). http://dx.doi.org/10.1109/ICPR.2006.392
Zhao, L., Qi, W., Li, S.Z., Yang, S.Q., Zhang, H.J.: Key-frame extraction and shot retrieval using nearest feature line (NFL). In: MULTIMEDIA ’00: Proceedings of the 2000 ACM Workshops on Multimedia, pp. 217–220. Assoc. Comput. Mach., New York (2000)
Zhu, X., Elmagarmid, A.K., Xue, X., Wu, L., Catlin, A.C.: InsightVideo: toward hierarchical video content organization for efficient browsing, summarization and retrieval. IEEE Trans. Multimedia 7(4), 648–666 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bailer, W., Lee, F. & Thallinger, G. A distance measure for repeated takes of one scene. TVC 25, 53–68 (2009). https://doi.org/10.1007/s00371-008-0280-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-008-0280-6