research-article

Place retrieval with graph-based place-view model

Authors:

Xianming LiuAuthors Info & Claims

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

Pages 268 - 275

https://doi.org/10.1145/1460096.1460141

Published: 30 October 2008 Publication History

Abstract

Places in movies and sitcoms could indicate higher-level semantic cues about the story scenarios and actor relations. This paper presents a novel unsupervised framework for efficient place retrieval in movies and sitcoms. We leverage face detection to filter out close-up frames from video dataset, and adopt saliency map analysis to partition background places from foreground actions. Consequently, we extract pyramid-based spatial-encoding correlogram from shot key frames for robust place representation. For effectively describing variant place appearances, we cluster key frames and model inter-cluster belonging of identical place by inside-shot association. Then hierarchical normalized cut is utilized over the association graph to differentiate physical places within videos and gain their multi-view representation as a tree structure. For efficient place matching in large-scale database, inversed indexing is applied onto the hierarchical graph structure, based on which approximate nearest neighbor search is proposed to largely accelerate search process. Experimental results on over 36-hour Friends sitcom database demonstrate the effectiveness, efficiency, and semantic revealing ability of our framework.

References

[1]

YouTube: www. youtube. com

[2]

Blinkx: http://www. blinkx. com

[3]

Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, Ramesh Jain, "Content-Based Image Retrieval at the End of the Early Years", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 12, December 2000, pp. 1349--1380.

Digital Library

[4]

Li Fei-Fei and Pietro Perona. A Bayesian hierarchical model for learning matural scene categories. Vol. 2, pp. 524--531, Computer Vision and Pattern Recognition, 2005.

Digital Library

[5]

G. Schindler, M. Brown and R. Szeliski. City-scale location recognition, Computer Vision and Pattern Recognition, pp. 1--7, 2007.

[6]

C Siagian, L Itti, Rapid Biologically-Inspired Scene Classification Using Features Shared with Visual Attention, IEEE Transactions on pattern Analysis and Machine Intelligence, Vol. 29, No. 2, pp. 300--312, 2007

Digital Library

[7]

D Gokalp, S Aksoy, Scene Classification Using Bag-of-Regions Representations, Computer Vision and Pattern Recognition, pp. 1--8, 2007

[8]

Zhao, YJ, Wang, T., Wang, P. et. al., Scene Segmentation and Categorization Using NCuts, Proceedings of the 2nd International Workshop on Semantic Learning Applications in Multimedia, in association with Computer Vision and Pattern Recognition, Minneapolis, MN, 2007. of Computer Vision and Pattern Recognition, pp. 1--7.

[9]

P Viola, MJ Jones, Robust Real-Time Face Detection, International Journal of Computer Vision, 2004.

Digital Library

[10]

W.-H. Cheng, W.-T. Chu and J.-L. Wu, "A visual attention based region-of-interest detection," IEICE Transactions on Information and Systems, Vol. E88-D, NO. 7, PP. 1578--1586, 2005.

Digital Library

[11]

L. Itti, C. Koch, and E. Niebur, "A Model of Saliency-based Visual Attention for Rapid Scene Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 11, pp. 1254--1259, Novermber, 1998.

Digital Library

[12]

J Yuan, J Li, F Lin, B Zhang, "A unified shot boundary detection framework based on graph partition model", ACM SIG Multimedia, pp. 539--542, 2005.

Digital Library

[13]

C. Stauffer and W. E. L. Grimson, "Adaptive Background Mixture Models for Real-Time Tracking," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, vol. 2, 1999, pp. 246--252.

[14]

Yueting Zhuang, Yong Rui, Huang, T. S., Mehrotra, S., Adaptive key frame extraction using unsupervised clustering, International Conference on Image Processing, pp. 10--8186--8821--1/98, 1998.

[15]

Jianbo Shi, Jitendra Malik, Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp. 888--905, 2000.

Digital Library

[16]

Huang J, Image indexing using color correlograms, IEEE Conference on Computer Vision and Pattern Recognition, 1997, 762--768

Digital Library

Cited By

Huang HZhang YHuang QGuo ZLiu ZLin D(2020)Placepedia: Comprehensive Place Understanding with Multi-faceted AnnotationsComputer Vision – ECCV 202010.1007/978-3-030-58589-1_6(85-103)Online publication date: 12-Nov-2020
https://doi.org/10.1007/978-3-030-58589-1_6
Zhang KSun HShi WFeng YJiang ZZhao J(2019)A Video Representation Method Based on Multi-View Structure Preserving Embedding for Action RetrievalIEEE Access10.1109/ACCESS.2019.29056417(50400-50411)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2905641
Han TYao HXu CSun XZhang YCorso J(2017)Dancelets Mining for Video Recommendation Based on Dance StylesIEEE Transactions on Multimedia10.1109/TMM.2016.263188119:4(712-724)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TMM.2016.2631881
Show More Cited By

Index Terms

Place retrieval with graph-based place-view model
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information storage systems

Recommendations

Multiple Occluded Face Detection Based on Binocular Saliency Map
ICONIP '09: Proceedings of the 16th International Conference on Neural Information Processing: Part I

In this paper, we propose a novel occluded face detection model which can detect multiple occlusions in a stereo image. The biologically motivated face preferable selective attention model localizes candidate regions for human faces in a natural scene, ...
Image pre-classification based on saliency map for image retrieval
ICICS'09: Proceedings of the 7th international conference on Information, communications and signal processing

In content-based image retrieval, it is helpful to add a pre-classification module to classify a query image into attentive class or non-attentive class. Based on the pre-classification result, a suitable retrieval strategy is adopted for the query ...
A new spatio-temporal JND model based on 3D pyramid decomposition
PCM'10: Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II

In this paper, a new Pyramidal Just-Noticeable-Distortion (PJND) model is proposed for video. This model incorporates the most relevant HVS properties such as: the spatio-temporal contrast sensitivity function, the influence of eye movements, the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

October 2008

506 pages

ISBN:9781605583129

DOI:10.1145/1460096

General Chair:
Michael S. Lew
Leiden University, The Netherlands
,
Program Chairs:
Alberto del Bimbo
University of Florence, Italy
,
Erwin M. Bakker
Leiden University, The Netherlands

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM08

Sponsor:

MM08: ACM Multimedia Conference 2008

October 30 - 31, 2008

British Columbia, Vancouver, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
188
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang HZhang YHuang QGuo ZLiu ZLin D(2020)Placepedia: Comprehensive Place Understanding with Multi-faceted AnnotationsComputer Vision – ECCV 202010.1007/978-3-030-58589-1_6(85-103)Online publication date: 12-Nov-2020
https://doi.org/10.1007/978-3-030-58589-1_6
Zhang KSun HShi WFeng YJiang ZZhao J(2019)A Video Representation Method Based on Multi-View Structure Preserving Embedding for Action RetrievalIEEE Access10.1109/ACCESS.2019.29056417(50400-50411)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2905641
Han TYao HXu CSun XZhang YCorso J(2017)Dancelets Mining for Video Recommendation Based on Dance StylesIEEE Transactions on Multimedia10.1109/TMM.2016.263188119:4(712-724)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TMM.2016.2631881
Ramezani MYaghmaee F(2016)A review on human action analysis in videos for retrieval applicationsArtificial Intelligence Review10.1007/s10462-016-9473-y46:4(485-514)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1007/s10462-016-9473-y
Han TYao HSun XZhang YZhao SLu XHuang YXie WZhou XSmeaton ATian QBulterman DShen HMayer-Patel KYan S(2015)"Clustering of Dancelets"Proceedings of the 23rd ACM international conference on Multimedia10.1145/2733373.2806363(915-918)Online publication date: 13-Oct-2015
https://dl.acm.org/doi/10.1145/2733373.2806363
Sun BYao HJi RXu PSun XYuan K(2010)Individual Home-Video Collecting Using a Co-clustering Method2010 First International Conference on Pervasive Computing, Signal Processing and Applications10.1109/PCSPA.2010.279(1132-1135)Online publication date: Sep-2010
https://doi.org/10.1109/PCSPA.2010.279
Yuan KYao HJi RSun X(2010)Mining actor correlations with hierarchical concurrence parsing2010 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2010.5494953(798-801)Online publication date: Mar-2010
https://doi.org/10.1109/ICASSP.2010.5494953
Yuan KJi RYao HSun XXu PLiu XLi SLu HXu DBimbo ATian QXu C(2009)VisualCor systemProceedings of the First International Conference on Internet Multimedia Computing and Service10.1145/1734605.1734655(213-218)Online publication date: 23-Nov-2009
https://dl.acm.org/doi/10.1145/1734605.1734655

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten