Abstract
In this paper, we develop a content-based video classification approach to support semantic categorization, high-dimensional indexing and multi-level access. Our contributions are in four points: (a) We first present a hierarchical video database model that captures the structures and semantics of video contents in databases. One advantage of this hierarchical video database model is that it can provide a framework for automatic mapping from high-level concepts to low-level representative features. (b) We second propose a set of useful techniques for exploiting the basic units (e.g., shots or objects) to access the videos in database. (c) We third suggest a learning-based semantic classification technique to exploit the structures and semantics of video contents in database. (d) We further develop a cluster-based indexing structure to both speed-up query-by-example and organize databases for supporting more effective browsing. The applications of this proposed multi-level video database representation and indexing structures for MPEG-7 are also discussed.
Similar content being viewed by others
References
R. Adams and L. Bischof, “Seeded region growing, ” IEEE Trans. on PAMI, Vol. 16, pp. 641-647, 1994.
A. Alatan et al., “Image sequence analysis for emerging interactive multimedia services-The European COST 211 framework, ” IEEE Trans. on CSVT, Vol. 8, pp. 802-813, 1998.
S. Berchtold, D.A. Keim, and H.P. Kriegel, “The X-tree: An index structure for high-dimensional data, ” in Proc. of VLDB'96, Bombay, India, 1996, pp. 28-39.
S.F. Chang, W. Chen, H.J. Meng, H. Sundaram, and D. Zhong, “A fully automatic content-based video search engine supporting spatiotemporal queries, ” IEEE Trans. on CSVT, Vol. 8, pp. 602-615, 1998.
J.-Y. Chen, C. Taskiran, A. Albiol, E.J. Delp, and C.A. Bouman, “ViBE: A compressed video database structured for active browsing and search, ” in Proc. SPIE: Multimedia Storage and Archiving Systems IV, Sept. 1999, Boston, Vol. 3846, pp. 148-164.
J.D. Courtney, “Automatic video indexing via object motion analysis, ” Pattern Recognition, Vol. 30, pp. 607-626, 1997.
Y. Deng and B.S. Manjunath, “NeTra-V: Toward an object-based video representation, ” IEEE Trans. on CSVT, Vol. 8, pp. 616-627, 1998.
C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, and R. Barber, “Efficient and effective querying by image content, ” Journal of Intelligent Information Systems, Vol. 3, pp. 231-262, 1994.
J. Fan, M.S. Hacid, X. Zhang, and A.K. Elmagarmid, “Semantic video object extraction towards contentbased indexing, ” in IASTED Int. Conf. on Internet and Multimedia Systems and Application, Las Vegas, Nov. 19-23, 2000, pp. 430-435.
J. Fan, D.K.Y. Yau, W.G. Aref, and A. Rezgui, “Adaptive motion-compensated video coding scheme towards content-based bit rate allocation, ” Journal of Electronic Imaging, Vol. 9, No. 4, pp. 521-533, 2000.
J. Fan et al., “Spatiotemporal segmentation for compact video representation, ” Signal Processing: Image Communication, Vol. 16, pp. 553-566, 2001.
B. Furht, S.W. Smoliar, and H.J. Zhang, Video and Image Processing in Multimedia Systems, Kluwer Academic Publisher, Norwell, MA, 1995.
B. Gunsel, A.M. Ferman, and A.M. Tekalp, “Temporal video segmentation using unsupervised clustering and semantic object tracking, ” J. Electronic Imaging, Vol. 7, pp. 592-604, 1998.
A. Guttman, “R-trees: A dynamic index structure for spatial searching, ” in ACM SIGMOD'84, 1984, pp. 47-57.
A. Humrapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller, J. Bach, M. Gorkani, and R. Jain, “Virage video engine, ” in SPIE Proc. Storage and Retrieval for Image and Video Databases V, San Jose, CA, Feb. 1997, pp. 188-197.
Y. Ishikawa, R. Subramanya, and C. Faloutsos, “Mindreader: Querying databases through multiple examples, ” in Proc. of VLDB'98, 1998.
A.K. Jain, A. Vailaya, and X. Wei, “Query by video clip, ” ACM Multimedia Systems, Vol. 7, pp. 369-384, 1999.
K.V.R. Kanth, D. Agrawal, and A. Singh, “Dimensionality reduction for similarity searching in dynamic databases, ” in ACM SIGMOD, 1998, pp. 166-176.
N. Katayama and S. Satoh, “The SR-tree: An index structure for high dimensional nearest neighbor queries, ” in ACM SIGMOD, 1997.
A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image databases, ” Int. J. Computer Vision, Vol. 18, pp. 233-254, 1996.
Y. Rui and T.S. Huang, “A novel relevance feedback technique in image retrieval, ” in Proc. ACM Multimedia' 99, 1999, pp. 67-70.
Y. Rui, T.S. Huang, and S. Mehrotra, “Constructing table-of-content for videos, ” Multimedia Systems, Vol. 7, pp. 359-368, 1999.
Y. Rui, T.S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: A power tool for interactive contentbased image retrieval, ” IEEE Trans. on CSVT, Vol. 8, pp. 644-655, 1998.
S. Satoh and T. Kanade, “Name-It: Association of face and name in video, ” in Proc. of Computer Vision and Pattern Recognition, 1997.
G. Sheikholeslami, W. Chang, and A. Zhang, “Semantic clustering and querying on heterogeneous features for visual data, ” in ACM Multimedia'98, 1998, pp. 3-11.
A. Thomasian, V. Castelli, and C.-S. Li, “Clustering and singular value decomposition for approximate indexing in high dimensional space, ” in CIKM'98, Bethesda, MD, USA, 1998, pp. 201-207.
H.J. Zhang, J. Wu, D. Zhong, and S. Smoliar, “An integrated system for content-based video retrieval and browsing, ” Pattern Recognition, Vol. 30, pp. 643-658, 1997.
D. Zhong, H.J. Zhang, and S.-F. Chang, “Clustering methods for video browsing and annotation, ” in Proc. SPIE, 1996, pp. 239-246.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fan, J., Zhu, X., Hacid, MS. et al. Model-Based Video Classification toward Hierarchical Representation, Indexing and Access. Multimedia Tools and Applications 17, 97–120 (2002). https://doi.org/10.1023/A:1014635823052
Issue Date:
DOI: https://doi.org/10.1023/A:1014635823052