skip to main content
10.1145/1460096.1460121acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cross-media manifold learning for image retrieval & annotation

Published: 30 October 2008 Publication History

Abstract

Fusion of visual content with textual information is an effective way for both content-based and keyword-based image retrieval. However, the performance of visual & textual fusion is affected greatly by the data noise and redundancy in both text (such as surrounding text in HTML pages) and visual (such as intra-class diversity) aspects. This paper presents a manifold-based cross-media optimization scheme to achieve visual & textual fusion within a unified framework. Cross-Media manifold co-training mechanism between Keyword-based Metric Space and Vision-Based Metric Space is proposed creatively to infer a best dual-space fusion by minimizing manifold-based visual & textual energy criterion. We present the Isomorphic Manifold Learning to map the annotation affection in image visual space onto keyword semantic space by manifold shrinkage. We also demonstrate its correctness and convergence from mathematical perspective. The retrieval can be performed using both keyword or sample images respectively on Keyword-Based Metric Space and Vision-Based Metric Space, while the simple distance classifiers will satisfy. Two groups of experiments are conducted: The first group is carried on Corel 5000 image database to validate our effectiveness by comparing with state-of-the-art Generalized Manifold Ranking Based Image Retrieval and SVM. The second group is done over real-world Flickr dataset with over 6,000 images to testify our effectiveness in real-world application. The promising results show that our model attains a significant improvement over state-of-the-art algorithms.

References

[1]
Flickr, www.flickr.com
[2]
X. Wang, W. Ma, G. Xue and X. Li. 2004. Multi-Model Similarity Propagation and its Application for Web Image Retrieval. ACM Multimedia 2004, pp 944--951
[3]
JB. Tenenbaum, V. Silva, JC. Langford. 2000. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 2000, Vol. 290, pp 2319--2323.
[4]
K. Barnard, D. Forsyth. 2001. Learning the Semantic of Words and Pictures. ICCV 2001, Volume: 2, pp. 408--415.
[5]
D. M. Blei and M. I. Jordan. 2003. Modeling Annotated Data. SIGIR 2003, pp 127--134.
[6]
X. Rui, M. Li, Z. Li, W. Ma, N. Yu. 2007. Bipartite Graph Reinforcement Model for Web Image Annotation. ACM M.M 2007, pp 585--594.
[7]
R. M.Haralick, K.Shanmugam, and I.Dinstein, "Texture features for image classification," IEEE Transaction on Systems Man and Cybernetics, Vol.3, Nov. 1973, pp. 610--621.
[8]
M. Seeger. 2002. Learning with labeled and unlabeled data. Inst. for Adaptive and Neural Computation, technical report
[9]
X. Zhu. 2006. Semi-Supervised Learning Literature Survey. Computer Science, University of Wisconsin-Madison
[10]
T. Joachims. 2003. Transductive Learning via Spectral Graph Partitioning. In Proceedings of the International Conference on Machine Learning, 2003
[11]
M. Culp, G. Michailidis. 2007. Graph-Based Semi-Supervised Learning. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Oct, 2007, Vol. 2, pp 856--860
[12]
Z. Zhang, H. Zha. 2005. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal of Scientific Computing,2005,26(1):313--338
[13]
J. M. Lee. 2000. Introduction to Topological Manifolds. Springer-Verlag, 2000.
[14]
S. Lang. 1996. Differential and Riemannian Manifolds. Springer-Verlag, 1996.
[15]
D. Freedman. 2002. Efficient Simplicial Reconstructions of Manifolds from their Samples. Pattern Analysis and Machine Intelligence, IEEE Transactions on, Oct, 2002. Vol. 24, pp 1349--1357.
[16]
Klema, V. Laub, A. "The singular value decomposition: Its computation and some applications," IEEE Transactions on Automatic Control, pp.164--176, April, 1980.
[17]
Jing Liu, Mingjing Li, Wei-Ying Ma, Qingshan Liu, Hanqing Lu, "An adaptive graph model for automatic image annotation," ACM SIGMM Workshop on Multimedia Information Retrieval, 2006, pp.61--70.
[18]
Jingrui He, Mingjing Li, Hong-Jiang Zhang, Hanghang Tong, Changshui Zhang, "Manifold-Ranking Based Image Retrieval," ACM Multimedia, October 10--16, pp.9--16, 2004, New York, USA.
[19]
Jingrui He, Mingjing Li, Hong-Jiang Zhang, Hanghang Tong, and Changshui Zhang, "Generalized Manifold-Ranking-Based Image Retrieval," IEEE Transactions on Image Processing, Vol. 15, No.10, pp. 3170--3177, October 2006.
[20]
Feng Jing, Mingjing Li, Hongjiang Zhang, and Bo Zhang, "A Unified Framework for Image Retrieval Using Keyword and Visual Features," IEEE Transactions on Image Processing, Vol. 14, No.7, pp. 979--989, July 2000.
[21]
Rongrong Ji, Hongxun Yao, "Visual & Textual Fusion for Region Retrieval from Both Bayesian Reasoning and Fuzzy Matching Aspects", ACM MM MIR 2007.
[22]
Rongrong Ji, Hongxun Yao, Pengfei Xu, Xiaoshuai Sun, Xianming Liu, "Real-Time Image Annotation by Manifold-based Biased Fisher Discriminate Learning," VCIP 2008.
[23]
Jarvelin, K, Kekalainen, J. Cumulated Gain-based Evaluation of IR Techniques. ACM Transactions on Information Systems, 2002, 20, pp. 422--446.
[24]
G. Salton, and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing and Management, 1998, Vol. 24, pp. 513--523.
[25]
Haiying Guan, M. Turk. "The Hierarchical Isometric Self-Organizing Map for Manifold Representation", IEEE Conference on Computer Vision and Pattern Recognition, 17--22 June 2007, Page 1--8.
[26]
K. Nigam, R. Ghani. "Analyzing the Effectiveness and Applicability of Co-training", Proceedings of the Ninth International Conference on Information and Knowledge Management, 2000, Page 86--93.
[27]
C. Fellbaum, WordNet: An Electronic Lexical Database, Bradford Book, May, 1998.

Cited By

View all

Index Terms

  1. Cross-media manifold learning for image retrieval & annotation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval
    October 2008
    506 pages
    ISBN:9781605583129
    DOI:10.1145/1460096
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic image annotation
    2. co-training
    3. content-based image retrieval
    4. manifold learning
    5. web image search

    Qualifiers

    • Research-article

    Conference

    MM08
    Sponsor:
    MM08: ACM Multimedia Conference 2008
    October 30 - 31, 2008
    British Columbia, Vancouver, Canada

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)An image selection framework for automatic report generationMultimedia Tools and Applications10.1007/s11042-022-13120-781:28(41175-41197)Online publication date: 18-May-2022
    • (2017)Semantic text-based image retrieval with multi-modality ontology and DBpediaThe Electronic Library10.1108/EL-06-2016-012735:6(1191-1214)Online publication date: 6-Nov-2017
    • (2016)Projective nonnegative matrix factorization for social image retrievalNeurocomputing10.1016/j.neucom.2014.09.094172(19-26)Online publication date: Jan-2016
    • (2016)Large-scale supervised similarity learning in networksKnowledge and Information Systems10.1007/s10115-015-0894-848:3(707-740)Online publication date: 1-Sep-2016
    • (2014)Factorized Similarity Learning in NetworksProceedings of the 2014 IEEE International Conference on Data Mining10.1109/ICDM.2014.115(60-69)Online publication date: 14-Dec-2014
    • (2013)Mining spatiotemporal video patterns towards robust action retrievalNeurocomputing10.1016/j.neucom.2012.06.044105(61-69)Online publication date: 1-Apr-2013
    • (2013)Nonlinear matrix factorization with unified embedding for social tag relevance learningNeurocomputing10.1016/j.neucom.2012.02.046105(38-44)Online publication date: 1-Apr-2013
    • (2011)A two-view learning approach for image tag rankingProceedings of the fourth ACM international conference on Web search and data mining10.1145/1935826.1935913(625-634)Online publication date: 9-Feb-2011
    • (2010)Non-parametric kernel ranking approach for social image retrievalProceedings of the ACM International Conference on Image and Video Retrieval10.1145/1816041.1816047(26-33)Online publication date: 5-Jul-2010
    • (2010)Exploring statistical properties for semantic annotation: sparse distributed and convergent assumptions for keywords2010 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2010.5494954(802-805)Online publication date: Mar-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media