Article

A bootstrapping approach to annotating large image collection

Authors:
HuaMin Feng

National University of Singapore, Singapore, Republic of Singapore

National University of Singapore, Singapore, Republic of Singapore
View Profile

,
Tat-Seng Chua

National University of Singapore, Singapore, Republic of Singapore

National University of Singapore, Singapore, Republic of Singapore
View Profile

MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrievalNovember 2003Pages 55–62https://doi.org/10.1145/973264.973274

Published:07 November 2003Publication History

MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval

Pages 55–62

ABSTRACT

Huge amount of manual efforts are required to annotate large image/video archives with text annotations. Several recent works attempted to automate this task by employing supervised learning approaches to associate visual information extracted in segmented images with semantic concepts provided by associated text. The main limitation of such approaches, however, is that large labeled training corpus is still needed for effective learning, and semantically meaningful segmentation for images is in general unavailable. This paper explores the use of bootstrapping approach to tackle this problem. The idea is to start from a small set of labeled training examples, and successively annotate a larger set of unlabeled examples. This is done using the cotraining approach, in which two "statistically independent" classifiers are used to co-train and co-annotate the unlabeled examples. An active learning approach is used to select the best examples to label at each stage of learning in order to maximize the learning objective. To accomplish this, we break the task of annotating images into the sub-tasks of: (a) segmenting images into meaningful units, (b) extracting appropriate features for the units, and (c) associating these features with text. Because of the uncertainty in sub-tasks (a) and (b), we adopt two independent segmentation methods (task a) and two independent sets of features (task b) to support co-training. We carried out experiments using a mid-sized image collection (comprising about 6,000 images from CorelCD, PhotoCD and Web) and demonstrated that our bootstrapping approach significantly improve the performance of annotation by about 10% in terms of F₁ measure as compared to the best results obtained from the traditional supervised learning approach. Moreover, the bootstrapping approach has the key advantage of requiring much fewer labeled examples in training.

References

John R. Smith and S-F Chang. VisualSeek: A Fully Automated Content-based Query System. In Proc. Fourth Int. Conf. Multimedia, ACM, 87--92. 1996. In Proc. Fourth Int. Conf. Multimedia, ACM. In Proc. Fourth Int. Conf. Multimedia, ACM.]] Google ScholarDigital Library
John R. Smilth, milind Naphade and Apostal (Paul) Natsev. Multimedia Semantic Indexing Using Model Vectors. 2003. ICME.]] Google ScholarDigital Library
Steven Abney. Bootstrapping. 40th Annual Meeting of the Association for Computational Linguistics. 2002. 40th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarDigital Library
A. Blum and T. Mitchell. Combined Labeled Data and Unlabelled Data with Co-training. In Proceeding of the 11th Annual Conference on Computational Learning Theory. 1998. In Proceeding of the 11th Annual Conference on Computational Learning Theory.]] Google ScholarDigital Library
David A. Cohn, Zoubin Ghahramani and Michael I. Jordan, Active learning with statistical models. Journal of Artificial Intelligence Reseach 4, 129--145 (1996).]]Google ScholarDigital Library
Y. Mori, H. Takahashi and R. Oka, Image-to-word transformation based on dividing and vector quantizing images with words. First International Workshop on multimedia Intelligent Storage and Retrieval Management (1999).]]Google Scholar
K. Barnard and D. A. Forsyth, Learning the semantics of words and pictures. IEEE International Conference on Computer Vision II, 408--415 (2001).]]Google Scholar
Edward Chang, Kingshy Goh, Gerard Sychay and Gang Wu, CBSA: content-based soft annotation for Multimodal Image Retrieval Using Bayes Point Machines. IEEE Transactions on Circuits and Systems for Video Technology Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description 13, 26--38 (2003).]]Google Scholar
K. Barnard, P. Duygulu and D. Forsyth, Clustering art. In Proc of IEEE Computer Vision and Pattern Recognition 434--441 (2001).]]Google Scholar
S. Belongie, C. Carson, H. Greenspan and J. Malik, Recognition of images in large databases using a learning framework. Technical report 07-939, UC Berkelely CS Tech Report 07-939, (1997).]] Google ScholarDigital Library
C. Carson, M. Thomas, S. B., J. M. Hellerstein and J. Malik, BlobWorld: A System for region-based image indexing and Retrieval. Int Conf Visual Inf Sys (1999).]] Google ScholarDigital Library
Edward Chang, Kingshy Goh, Gerard Sychay and Gang Wu, CBSA: content-based soft annotation for Multimodal Image Retrieval Using Bayes Point Machines. IEEE Transactions on Circuits and Systems for Video Technology Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description 13, 26--38 (2003).]]Google Scholar
R. Herbrick, T. Graepel and C. Campbell, Bayes Point Machines. Journal of Machine Learning Research 1, 245--279 (2001).]] Google ScholarDigital Library
James Z. Wang and Jia Li. Learning-based Linguistic Indexing of Pictures with 2-D MHHMs. The 10th ACM Int. Conference on Multimedia, 436--445. 2002. The 10th ACM Int. Conference on Multimedia.]] Google ScholarDigital Library
M. Collins and Y. Singer. Unsupervised Models for Name Entity Classification. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural language Processing and Very Large Corpora. 1999. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural language Processing and Very Large Corpora.]]Google Scholar
I. Muslea, S. Minton and C. A. Knoblock, Selective sampling with co-testing. in CRM workshop on Combining and Selecting Multiple Models with Machine Learning (2000).]]Google Scholar
K. Nigam and R. Ghani. Analyzing the Effectiveness and Applicability of Co-training. 2000. In Proceedings of the 9th International Coference on Information and Knowledge management.]] Google ScholarDigital Library
Y. Cao, H. Li and L. Lian, Uncertainty reduction in collaborative bootstapping:measure and algorithm. In proceeding of the 41th Annual Meeting of the Association for computational Linguistics (2003).]] Google ScholarDigital Library
D. D Lewis and W. A. Gale, A sequential algorithm for training text classifers. In proceeding of ACM SIGIR 3--12 (1994).]] Google ScholarDigital Library
Cha Zhang and Tsuhan Chen, An active learning framework for content-based information retrieval. IEEE transactions on multimedia 4, 260--268 (2002).]]Google Scholar
Y. Deng and B. S. Manjunath, Unsupervised segmentation of color-texture regions in images and video. IEEE Trans on Pattern Analysis and Machine Intelligence 23, 800--810 (2001).]] Google ScholarDigital Library
Tat_Seng Chua and Jimin Liu, Learning pattern rules for Chinese named-entity extraction. AAAI'2002. AAAI'2002 411--418 (2002).]] Google ScholarDigital Library
Ross Quinlan. Data Mining Tools See5 and C5.0. http://www.rulequest.com/see5-info.html. 2003.]]Google Scholar
Vladimir Vapnik, The nature of Statistical Learning Theory, Springer, New York 1995.]] Google ScholarDigital Library
John C. Platt, Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In Advances in Large Margin Classifiers MIT Press (1999).]]Google Scholar

Index Terms

A bootstrapping approach to annotating large image collection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
        Video segmentation
      2. Computer vision tasks
        Scene understanding
  2. Machine learning

Recommendations

A bootstrapping framework for annotating and retrieving WWW images
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

Most current image retrieval systems and commercial search engines use mainly text annotations to index and retrieve WWW images. This research explores the use of machine learning approaches to automatically annotate WWW images based on a predefined ...
Read More
Semi-supervised co-training and active learning based approach for multi-view intrusion detection
SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing

Although there is immense data available from networks and hosts, a very small proportion of this data is labeled due to the cost of obtaining expert labels. This proves to be a significant bottle-neck for developing supervised intrusion detection ...
Read More
Mining relational data from text: From strictly supervised to weakly supervised learning

This paper approaches the relation classification problem in information extraction framework with different machine learning strategies, from strictly supervised to weakly supervised. A number of learning algorithms are presented and empirically ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
November 2003
281 pages
ISBN:1581137788
DOI:10.1145/973264
Conference Chairs:
Nicu Sebe
University of Amsterdam
,
Michael S. Lew
LIACS Media Lab
,
Chabanne Djeraba
LIFL, University of Sciences and Technologies of Lille
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
bootstrapping
co-training
image annotation
Qualifiers
- Article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 991
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A bootstrapping approach to annotating large image collection

MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A bootstrapping framework for annotating and retrieving WWW images

Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Mining relational data from text: From strictly supervised to weakly supervised learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A bootstrapping approach to annotating large image collection

MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A bootstrapping framework for annotating and retrieving WWW images

Semi-supervised co-training and active learning based approach for multi-view intrusion detection

Mining relational data from text: From strictly supervised to weakly supervised learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media