Abstract
In order to support content-based video database access over the Internet Protocol (IP), achieving the following objectives are important: (i) video query by a representative object (key object) or some statistical characterization of the target contents, (ii) bandwidth-efficient browsing over IP, and (iii) scalable and user-centric video transmission over a heterogeneous and variable-bandwidth network. We present a video object extraction and scalable coding system designed to meet the above objectives. In our system, key objects of meaning to video database users are generated via a human-computer-interaction procedure, and are tracked across frames. Given a key object, an algorithm classifies a subset of its VOPs as key VOPs. This subset forms the basis of a highly bandwidth-efficient base layer for supporting activities such as browsing and refining queries. Over the base layer, a number of enhancement layers can be defined to progressively increase the spatial and temporal resolutions of retrieved video. It is expected that heterogeneous users can subscribe to different numbers of the enhancement layers according to their own conditions, such as access authorization, available connection bandwidth, and quality preference.
Similar content being viewed by others
References
G. Adiv, “Determining three-dimensional motion and structure from optical flow generated by several moving objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 7, pp. 384–401, 1985.
A.A. Alatan, L. Onural, M. Wollborn, R. Mech, E. Tuncel, and T. Sikora, “Image sequence analysis for emerging interactive multimedia services-The European COST 211 framework,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 802–813, 1998.
A.D. Bimbo, E. Vicario, and D. Zingoni, “Symbolic description and visual querying of image sequences using spatio-temporal logic,” IEEE Trans. on Knowledge and Data Engineering, Vol. 7, pp. 609, 1995.
P. Bouthemy and E. Francois, “Motion segmentation and qulitative dynamic scene analysis from an image sequence,” Int'l J. Computer Vision, Vol. 10, pp. 157–182, 1993.
J. Cai and A. Goshtasby, “Detecting human faces in color image,” Image and Vision Computing, Vol. 18, pp. 63–75, 1999.
R. Castagno, T. Ebrahimi, and M. Kunt, “Semiautomatic segmentation and tracking of semantic video objects,” IEEE Trans on Circuits and Systems for Video Technology, Vol. 8, pp. 572–584, 1998.
S.F. Chang, W. Chen, H.J. Meng, H. Sundaram, and D. Zhong, “A fully automatic content-based video search engine supporting spatiotemporal queries,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 602–615, 1998.
J.-Y. Chen, C. Taskiran, A. Albiol, E.J. Delp, and C.A. Bouman, “ViBE: A Compressed Video Database Structured for Active Browsing and Search,” in Proc. SPIE: Multimedia Storage and Archiving Systems IV, Sept. 1999, Boston, Vol. 3846, pp. 148–164.
J.D. Courtney, “Automatic video indexing via object motion analysis,” Pattern Recognition, Vol. 30, pp. 607–625, 1997.
I.J. Cox, M. Miller, T.P. Minka, T.V. Papathomas, and P.N. Yianilos, “The Bayesian image retrieval system, PicHunter: Theory, implementation and psychophysical experiments,” IEEE Trans. on Image Processing, Vol. 9, pp. 20–37, 2000.
Y. Deng and B.S. Manjunath, “NeTra-V: Toward an object-based video representation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 616–627, 1998.
N. Diehl, “Object-oriented motion estimation and segmentation in image sequences,” IEEE Trans. on Image Processing, Vol. 3, pp. 1901–1904, 1990.
M.F. Dubuisson and A.K. Jain, “Contour extraction for moving objects in complex outdoor scene,” Int'l J. Computer Vision, Vol. 14, pp. 83–105, 1995.
C. Faloutsos and K.-I. Lin, “FastMap: A fast algorithm for indexing, data-mining and visualization for traditional and multimedia datasets,” in ACM SIGMOD, San Jose, CA, 1995, pp. 163–174.
J. Fan, “Adaptive motion-compensated video coding scheme towards content-based bit rate allocation,” J. Electronic Imaging, Vol. 9, Oct. 2000.
J. Fan, M. Body, X. Zhu, and M.-S. Hacid, “Seeded image segmentation toward content-based image retrieval application,” Storage and Retrieval of Multimedia Database, San Jose, CA, Jan. 23–26, 2002.
J. Fan, G. Fujita, M. Furuie, T. Onoye, I. Shirakawa, and L. Wu, “Automatic moving object extraction towards compact video representation,” Optical Engineering, Vol. 39, No. 2, pp. 438–452, 2000.
J. Fan and F. Gan, “Motion estimation based on uncompensability analysis,” IEEE Trans. on Image Processing, Vol. 6, pp. 1584–1587, 1997.
J. Fan, R. Wang, L. Zhang, D. Xing, and F. Gan, “Image sequence segmentation based on 2D temporal entropy,” Pattern Recognition Letters, Vol. 17, pp. 1101–1107, 1996.
J. Fan, D.K.Y. Yau, A.K. Elmagarmid, and W.G. Aref, “Automatic image segmentation by integrating color edge detection and seeded region growing,” IEEE Trans. on Image Processing,Vol. 10, No. 10, pp. 1454–1466, 2001.
J. Fan, L. Zhang, and F. Gan, “Spatiotemporal segmentation based on two-dimensional spatio-temporal entropic thresholding,” Optical Engineering, Vol. 36, pp. 2845–2851, 1997.
M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query by image and video content: The QBIC system,” IEEE Computer, Vol. 38, pp. 23–31, 1995.
D. Forsyth and M. Fleck, “Finding people and animals by guided assembly,” in Proc. of ICIP, Santa Barbara, USA, 1997.
C. Gu and M.C. Lee, “Semantic segmentation and tracking of semantic video objects,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 572–584, 1998.
B. Gunsel, A.M. Ferman, and A.M. Tekalp, “Temporal video segmentation using unsupervised clustering and semantic object tracking,” J. Electronic Imaging, Vol. 7, pp. 592–604, 1998.
J. Guo, J. Kim, and C.-C. J. Kuo, “SIVOG: Smart interactive video object generation system,” ACM Multimedia, Orlando, FL, 1999, pp. 13–16.
J. Haddon and J. Boyce, “Image segmentation by unifying region and boundary information,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 12, pp. 929–948, 1990.
M. Hoetter and R. Thoma, “Image segmentation based on object oriented mapping parameter estimation,” Signal Processing, Vol. 15, pp. 315–334, 1989.
A. Humrapur, A. Gupta, B. Horowitz, C.F. Shu, C. Fuller, J. Bach, M. Gorkani, and R. Jain, “Virage video engine,” in SPIE Proc. Storage and Retrieval for Image and Video Databases V, San Jose, CA, Feb. 1997, pp. 188–197.
D.P. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 15, pp. 850–863, 1993.
M. Irani and P. Anandan, “Video indexing based on mosaic representation,” Proc. IEEE, Vol. 86, pp. 905–921, 1998.
Y. Ishikawa, R. Subramanya, and C. Faloutsos, “Mindreader: Query databases through multiple examples,” in Proc. of the 24th VLDB Conf., 1998.
A. Jaimes and S.F. Chang, “Model-based classification of visual information for content-based retrieval,” in Proc. SPIE: Storage and Retrieval for Image and Video Database, San Jose, CA, 1999.
A.K. Jain, A. Vailaya, and X. Wei, “Query by video clip,” ACM Multimedia Systems, Vol. 7, pp. 369–384, 1999.
Y.-M. Kwon, E. Ferrari, and E. Bertino, “Modeling spatio-temporal constraints fro multimedia objects,” Data and Knowledge Engineering, Vol. 30, pp. 217–238, 1999.
H. Luo and A. Eleftheriadis, “Designing an interactive tool for video object segmentation,” ACM Multimedia'99, pp. 265–269.
T. Meier and K.N. Ngan, “Automatic segmentation of moving objects for video object plane generation,” IEEE Trans. Circuits and Systems for Video Technology, Vol. 8, pp. 525–538, 1998.
F. Moscheni, S. Bhattacharjee, and M. Kunt, “Spatiotemporal segmentation based on region merging,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 20, pp. 897–914, 1998.
M.R. Naphade and T.S. Huang, “Aprobabilistic framework for semantic video indexing, filtering and retrieval,” IEEE Trans. on Multimedia, Vol. 3, pp. 141–151, 2001.
E. Oomoto and K. Tanaka, “OVID: Design and implementation of a video object database system,” IEEE Trans Knowledge and Data Engineering, Vol. 5, pp. 629–643, 1993.
N. Pal and S. Pal, “Entropic thresholding,” Signal Processing, Vol. 16, pp. 97–108, 1989.
A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image databases,” Int. J. Computer Vision, Vol. 18, pp. 233–254, 1996.
R.W. Picard and T.P. Minka, “Vision texture for annotation,” ACM Multimedia Systems, special issue on content-based retrieval, Vol. 3, pp. 3–14, 1995.
Y. Rui and T.S. Huang, “A novel relevance feedback technique in image retrieval,” ACM Multimedia'99, pp. 67–70.
Y. Rui, T.S. Huang, and S. Mehrotra, “Browsing and retrieving video content in a unified framework,” in Proc. IEEE Int'l Conf. on Multimedia Computing and Systems, Austin, TX, 1998, pp. 237–240.
Y. Rui, T.S. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: A power tool for interactive content-based image retrieval,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 8, pp. 644–655, 1998.
Y. Rui, A.C. She, and T.S. Huang, “Modified Fourier descriptors for shape representation-a practical approach,” in Proc. of First Int. Workshop on Image Database amd Multi Media Search, 1996.
P. Salembier and M. Pardás, “Hierarchical morphological segmentation for image sequence coding,” IEEE Trans. on Image Processing, Vol. 3, pp. 639–651, 1994.
S. Satoh and T. Kanade, “Name-It: Association of face and name in video,” in Proc. of Computer Vision and Pattern Recognition, 1997.
G. Sheikholeslami, W. Chang, and A. Zhang, “Semantic clustering and querying on heterogeneous features for visual data,” ACM Multimedia'99, pp. 3–12, Bristol, UK.
J.R. Smith and S.F. Chang, “VisualSEEK: A fully automated content-based image query system,” in ACM Multimedia Conf., Bosston, MA, Nov. 1996, pp. 87–98.
H. Tamura, S. Mori, and T. Yamawaki, “Texture features corresponding to visual perception,” IEEE Trans. on System, Man, and Cybern., Vol. 8, pp. 460–472, 1978.
A. Vailaya, M. Figueiredo, A.K. Jain, and H.J. Zhang, “A Bayesian framework for semantic classification of outdoor vacation images,” Prof. SPIE, Vol. 3656, pp. 415–426, 1999.
J.Y.A. Wang and E.H. Adelson, “Representing moving image with layers,” IEEE Trans. Image Processing, Vol. 3, pp. 625–638, 1994.
G. Wei and I.K. Sethi, “Face detection for image annotation,” Pattern Recognition Letters, Vol. 20, pp. 1313–1321, 1999.
Y. Xu and E.C. Uberbacher, “2D image segmentation using minimum spanning trees,” Image and Vision Computing, Vol. 15, pp. 47–57, 1997.
B.-L. Yeo and M.M. Yeung, “Classification, simplification and dynamic visualization of scene transition graphs for video browsing,” in Proc. SPIE, Vol. 3312, pp. 60–70, 1997.
M.M. Yeung, B.-L. Yeo, and B. Liu, “Extracting story units from long program for video browsing and navigation,” in Proc. Third IEEE Int'l Conf. Multimedia Computing and Systems, June 1996.
H.J. Zhang, J. Wu, D. Zhong, and S. Smoliar, “An integrated system for content-based video retrieval and browsing,” Pattern Recognition, Vol. 30, pp. 643–658, 1997.
D. Zhao and J. Chen, “Affine curve moment invariants for shape recognition,” Pattern Recognition, Vol. 30, pp. 895–901, 1997.
D. Zhong, H.J. Zhang, and S.-F. Chang, “Clustering methods for video browsing and annotation,” in Proc. SPIE, 1996.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fan, J., Zhu, X., Najarian, K. et al. Accessing Video Contents through Key Objects over IP. Multimedia Tools and Applications 21, 75–96 (2003). https://doi.org/10.1023/A:1025086200838
Issue Date:
DOI: https://doi.org/10.1023/A:1025086200838