research-article

Cross-media manifold learning for image retrieval & annotation

Authors:

Tianqiang LiuAuthors Info & Claims

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

Pages 141 - 148

https://doi.org/10.1145/1460096.1460121

Published: 30 October 2008 Publication History

Abstract

Fusion of visual content with textual information is an effective way for both content-based and keyword-based image retrieval. However, the performance of visual & textual fusion is affected greatly by the data noise and redundancy in both text (such as surrounding text in HTML pages) and visual (such as intra-class diversity) aspects. This paper presents a manifold-based cross-media optimization scheme to achieve visual & textual fusion within a unified framework. Cross-Media manifold co-training mechanism between Keyword-based Metric Space and Vision-Based Metric Space is proposed creatively to infer a best dual-space fusion by minimizing manifold-based visual & textual energy criterion. We present the Isomorphic Manifold Learning to map the annotation affection in image visual space onto keyword semantic space by manifold shrinkage. We also demonstrate its correctness and convergence from mathematical perspective. The retrieval can be performed using both keyword or sample images respectively on Keyword-Based Metric Space and Vision-Based Metric Space, while the simple distance classifiers will satisfy. Two groups of experiments are conducted: The first group is carried on Corel 5000 image database to validate our effectiveness by comparing with state-of-the-art Generalized Manifold Ranking Based Image Retrieval and SVM. The second group is done over real-world Flickr dataset with over 6,000 images to testify our effectiveness in real-world application. The promising results show that our model attains a significant improvement over state-of-the-art algorithms.

References

[1]

Flickr, www.flickr.com

[2]

X. Wang, W. Ma, G. Xue and X. Li. 2004. Multi-Model Similarity Propagation and its Application for Web Image Retrieval. ACM Multimedia 2004, pp 944--951

Digital Library

[3]

JB. Tenenbaum, V. Silva, JC. Langford. 2000. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 2000, Vol. 290, pp 2319--2323.

[4]

K. Barnard, D. Forsyth. 2001. Learning the Semantic of Words and Pictures. ICCV 2001, Volume: 2, pp. 408--415.

[5]

D. M. Blei and M. I. Jordan. 2003. Modeling Annotated Data. SIGIR 2003, pp 127--134.

Digital Library

[6]

X. Rui, M. Li, Z. Li, W. Ma, N. Yu. 2007. Bipartite Graph Reinforcement Model for Web Image Annotation. ACM M.M 2007, pp 585--594.

Digital Library

[7]

R. M.Haralick, K.Shanmugam, and I.Dinstein, "Texture features for image classification," IEEE Transaction on Systems Man and Cybernetics, Vol.3, Nov. 1973, pp. 610--621.

[8]

M. Seeger. 2002. Learning with labeled and unlabeled data. Inst. for Adaptive and Neural Computation, technical report

[9]

X. Zhu. 2006. Semi-Supervised Learning Literature Survey. Computer Science, University of Wisconsin-Madison

[10]

T. Joachims. 2003. Transductive Learning via Spectral Graph Partitioning. In Proceedings of the International Conference on Machine Learning, 2003

[11]

M. Culp, G. Michailidis. 2007. Graph-Based Semi-Supervised Learning. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Oct, 2007, Vol. 2, pp 856--860

Digital Library

[12]

Z. Zhang, H. Zha. 2005. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal of Scientific Computing,2005,26(1):313--338

Digital Library

[13]

J. M. Lee. 2000. Introduction to Topological Manifolds. Springer-Verlag, 2000.

[14]

S. Lang. 1996. Differential and Riemannian Manifolds. Springer-Verlag, 1996.

[15]

D. Freedman. 2002. Efficient Simplicial Reconstructions of Manifolds from their Samples. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Oct, 2002. Vol. 24, pp 1349--1357.

Digital Library

[16]

Klema, V. Laub, A. "The singular value decomposition: Its computation and some applications," IEEE Transactions on Automatic Control, pp.164--176, April, 1980.

[17]

Jing Liu, Mingjing Li, Wei-Ying Ma, Qingshan Liu, Hanqing Lu, "An adaptive graph model for automatic image annotation," ACM SIGMM Workshop on Multimedia Information Retrieval, 2006, pp.61--70.

Digital Library

[18]

Jingrui He, Mingjing Li, Hong-Jiang Zhang, Hanghang Tong, Changshui Zhang, "Manifold-Ranking Based Image Retrieval," ACM Multimedia, October 10--16, pp.9--16, 2004, New York, USA.

Digital Library

[19]

Jingrui He, Mingjing Li, Hong-Jiang Zhang, Hanghang Tong, and Changshui Zhang, "Generalized Manifold-Ranking-Based Image Retrieval," IEEE Transactions on Image Processing, Vol. 15, No.10, pp. 3170--3177, October 2006.

Digital Library

[20]

Feng Jing, Mingjing Li, Hongjiang Zhang, and Bo Zhang, "A Unified Framework for Image Retrieval Using Keyword and Visual Features," IEEE Transactions on Image Processing, Vol. 14, No.7, pp. 979--989, July 2000.

Digital Library

[21]

Rongrong Ji, Hongxun Yao, "Visual & Textual Fusion for Region Retrieval from Both Bayesian Reasoning and Fuzzy Matching Aspects", ACM MM MIR 2007.

Digital Library

[22]

Rongrong Ji, Hongxun Yao, Pengfei Xu, Xiaoshuai Sun, Xianming Liu, "Real-Time Image Annotation by Manifold-based Biased Fisher Discriminate Learning," VCIP 2008.

[23]

Jarvelin, K, Kekalainen, J. Cumulated Gain-based Evaluation of IR Techniques. ACM Transactions on Information Systems, 2002, 20, pp. 422--446.

Digital Library

[24]

G. Salton, and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing and Management, 1998, Vol. 24, pp. 513--523.

Digital Library

[25]

Haiying Guan, M. Turk. "The Hierarchical Isometric Self-Organizing Map for Manifold Representation", IEEE Conference on Computer Vision and Pattern Recognition, 17--22 June 2007, Page 1--8.

[26]

K. Nigam, R. Ghani. "Analyzing the Effectiveness and Applicability of Co-training", Proceedings of the Ninth International Conference on Information and Knowledge Management, 2000, Page 86--93.

Digital Library

[27]

C. Fellbaum, WordNet: An Electronic Lexical Database, Bradford Book, May, 1998.

Cited By

Hyun CHur CPark H(2022)An image selection framework for automatic report generationMultimedia Tools and Applications10.1007/s11042-022-13120-781:28(41175-41197)Online publication date: 18-May-2022
https://doi.org/10.1007/s11042-022-13120-7
(2017)Semantic text-based image retrieval with multi-modality ontology and DBpediaThe Electronic Library10.1108/EL-06-2016-012735:6(1191-1214)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1108/EL-06-2016-0127
Liu QLi Z(2016)Projective nonnegative matrix factorization for social image retrievalNeurocomputing10.1016/j.neucom.2014.09.094172(19-26)Online publication date: Jan-2016
https://doi.org/10.1016/j.neucom.2014.09.094
Show More Cited By

Index Terms

Cross-media manifold learning for image retrieval & annotation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Biased discriminant euclidean embedding for content-based image retrieval

With many potential multimedia applications, content-based image retrieval (CBIR) has recently gained more attention for image management and web search. A wide variety of relevance feedback (RF) algorithms have been developed in recent years to improve ...
Scalable search-based image annotation of personal images
MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval

With the prevalence of digital cameras, more and more people have considerable digital images on their personal devices. As a result, there are increasing needs to effectively search these personal images. Automatic image annotation may serve the goal, ...
Image retrieval based on bag of images
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

Conventional relevance feedback schemes may not be suitable to all practical applications of content-based image retrieval (CBIR), since most ordinary users would like to complete their search in a single interaction, especially on the web search. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

October 2008

506 pages

ISBN:9781605583129

DOI:10.1145/1460096

General Chair:
Michael S. Lew
Leiden University, The Netherlands
,
Program Chairs:
Alberto del Bimbo
University of Florence, Italy
,
Erwin M. Bakker
Leiden University, The Netherlands

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM08

Sponsor:

MM08: ACM Multimedia Conference 2008

October 30 - 31, 2008

British Columbia, Vancouver, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
433
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hyun CHur CPark H(2022)An image selection framework for automatic report generationMultimedia Tools and Applications10.1007/s11042-022-13120-781:28(41175-41197)Online publication date: 18-May-2022
https://doi.org/10.1007/s11042-022-13120-7
(2017)Semantic text-based image retrieval with multi-modality ontology and DBpediaThe Electronic Library10.1108/EL-06-2016-012735:6(1191-1214)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1108/EL-06-2016-0127
Liu QLi Z(2016)Projective nonnegative matrix factorization for social image retrievalNeurocomputing10.1016/j.neucom.2014.09.094172(19-26)Online publication date: Jan-2016
https://doi.org/10.1016/j.neucom.2014.09.094
Chang SQi GYang YAggarwal CZhou JWang MHuang T(2016)Large-scale supervised similarity learning in networksKnowledge and Information Systems10.1007/s10115-015-0894-848:3(707-740)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1007/s10115-015-0894-8
Chang SQi GAggarwal CZhou JWang MHuang T(2014)Factorized Similarity Learning in NetworksProceedings of the 2014 IEEE International Conference on Data Mining10.1109/ICDM.2014.115(60-69)Online publication date: 14-Dec-2014
https://dl.acm.org/doi/10.1109/ICDM.2014.115
Cao LJi RGao YLiu WTian Q(2013)Mining spatiotemporal video patterns towards robust action retrievalNeurocomputing10.1016/j.neucom.2012.06.044105(61-69)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1016/j.neucom.2012.06.044
Li ZLiu JLu H(2013)Nonlinear matrix factorization with unified embedding for social tag relevance learningNeurocomputing10.1016/j.neucom.2012.02.046105(38-44)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1016/j.neucom.2012.02.046
Zhuang JHoi SKing INejdl WLi H(2011)A two-view learning approach for image tag rankingProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935913(625-634)Online publication date: 9-Feb-2011
https://dl.acm.org/doi/10.1145/1935826.1935913
Zhuang JHoi SLi SGao XSebe N(2010)Non-parametric kernel ranking approach for social image retrievalProceedings of the ACM International Conference on Image and Video Retrieval10.1145/1816041.1816047(26-33)Online publication date: 5-Jul-2010
https://dl.acm.org/doi/10.1145/1816041.1816047
Liu XYao HJi R(2010)Exploring statistical properties for semantic annotation: sparse distributed and convergent assumptions for keywords2010 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2010.5494954(802-805)Online publication date: Mar-2010
https://doi.org/10.1109/ICASSP.2010.5494954
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten