Abstract
Automatic categorization of video shots is the first and necessary step for organizing a long video stream into high-level scenes. However, existing techniques on video shot categorization still suffer from the problem of semantic gap between low-level audio-visual features and high-level semantic concepts. To bridge the gap, current researchers have been making efforts on the characterizations of: (1) spatio-temporal coherence among shots, and (2) bipartite correlation between descriptive features and shot categories. In the most recent works, spectral clustering methods and information-theoretic co-clustering (ITCC) have been actively studied and used to solve the above two issues, respectively. In this paper, we investigate the effectiveness of the two algorithms on video shot categorization. The comparison is examined in terms of estimating number of clusters and classification accuracies, where the K-means clustering algorithm is used as the benchmark. Experiments on 4-h sports videos show that both algorithms perform better than K-means. While the ITCC algorithm has advantages in estimating the number of clusters, the spectral clustering is better concerning the classification accuracy.
Similar content being viewed by others
References
Dhillon IS (2001) CO-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the ACM KDD, pp 269–274
Dhillon IS, Mallela S, Modha (2003), Information theoretic co-clustering. In: Proceedings of the ACM KDD, pp 89–98
Duan LY, Xu M, Chua TS, Tian Q, Xu CS (2003) Mid-level representation framework for semantic sports video analysis. In: Proceedings of the ACM multimedia, pp 33–44
Fan J, Elmagarmid AK, Zhu X, Aref WG, Wu L (2004), ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Trans Multimedia 6(1):70–86
Gargi U, Kasturi R, Strayer SH (2000) Performance characterization of video-shot-change detection methods. IEEE Trans CSVT 10(1): 1–13
Ma YF, Zhang HJ (2001) A new perceived motion based shot content representation. Proc IEEE ICIP 3:426–429
Manor LZ, Perona P (2004) Self-tuning spectral clustering. In: Proceedings of the NIPS, pp 1601–1608
Ng AY, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Proceedings of the NIPS, pp 849–856
Ngo CW, Pong TC, Zhang HJ (2002) On clustering and retrieval of video shots through temporal slices analysis. IEEE Trans Multimedia 4(4):446–458
Ngo CW, Ma YF, Zhang HJ (2003), Automatic video summarization by graph modeling. Proc IEEE ICCV 1:104–109
Pelleg D, Moore AW (2000) X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of the ICML, pp 727–734
Quackenbush S, Lindsay A (2001) Overview of MPEG-7 Audio. IEEE Trans CSVT 11(6):725–729
Sikora T (2001) The MPEG-7 visual standard for content description-an over-view. IEEE Trans CSVT 11(6):696–702
Tavanapong W, Zhou J (2004) Shot clustering techniques for story browsing. IEEE Trans Multimedia 6(4):517–527
Wang Y, Liu Z, Huang J (2000) Multimedia content analysis using audio and visual information. IEEE Signal Process Mag 17(6):12–36
Wang P, Cai R, Yang SQ (2005) Improving classification of video shots using information-theoretic co-clustering. Proceedings of the IEEE ISCAS
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, P., Liu, ZQ. & Yang, SQ. Investigation on unsupervised clustering algorithms for video shot categorization. Soft Comput 11, 355–360 (2007). https://doi.org/10.1007/s00500-006-0089-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-006-0089-z