skip to main content
10.1145/973264.973274acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

A bootstrapping approach to annotating large image collection

Authors Info & Claims
Published:07 November 2003Publication History

ABSTRACT

Huge amount of manual efforts are required to annotate large image/video archives with text annotations. Several recent works attempted to automate this task by employing supervised learning approaches to associate visual information extracted in segmented images with semantic concepts provided by associated text. The main limitation of such approaches, however, is that large labeled training corpus is still needed for effective learning, and semantically meaningful segmentation for images is in general unavailable. This paper explores the use of bootstrapping approach to tackle this problem. The idea is to start from a small set of labeled training examples, and successively annotate a larger set of unlabeled examples. This is done using the cotraining approach, in which two "statistically independent" classifiers are used to co-train and co-annotate the unlabeled examples. An active learning approach is used to select the best examples to label at each stage of learning in order to maximize the learning objective. To accomplish this, we break the task of annotating images into the sub-tasks of: (a) segmenting images into meaningful units, (b) extracting appropriate features for the units, and (c) associating these features with text. Because of the uncertainty in sub-tasks (a) and (b), we adopt two independent segmentation methods (task a) and two independent sets of features (task b) to support co-training. We carried out experiments using a mid-sized image collection (comprising about 6,000 images from CorelCD, PhotoCD and Web) and demonstrated that our bootstrapping approach significantly improve the performance of annotation by about 10% in terms of F1 measure as compared to the best results obtained from the traditional supervised learning approach. Moreover, the bootstrapping approach has the key advantage of requiring much fewer labeled examples in training.

References

  1. John R. Smith and S-F Chang. VisualSeek: A Fully Automated Content-based Query System. In Proc. Fourth Int. Conf. Multimedia, ACM, 87--92. 1996. In Proc. Fourth Int. Conf. Multimedia, ACM. In Proc. Fourth Int. Conf. Multimedia, ACM.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. John R. Smilth, milind Naphade and Apostal (Paul) Natsev. Multimedia Semantic Indexing Using Model Vectors. 2003. ICME.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Steven Abney. Bootstrapping. 40th Annual Meeting of the Association for Computational Linguistics. 2002. 40th Annual Meeting of the Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Blum and T. Mitchell. Combined Labeled Data and Unlabelled Data with Co-training. In Proceeding of the 11th Annual Conference on Computational Learning Theory. 1998. In Proceeding of the 11th Annual Conference on Computational Learning Theory.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David A. Cohn, Zoubin Ghahramani and Michael I. Jordan, Active learning with statistical models. Journal of Artificial Intelligence Reseach 4, 129--145 (1996).]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Mori, H. Takahashi and R. Oka, Image-to-word transformation based on dividing and vector quantizing images with words. First International Workshop on multimedia Intelligent Storage and Retrieval Management (1999).]]Google ScholarGoogle Scholar
  7. K. Barnard and D. A. Forsyth, Learning the semantics of words and pictures. IEEE International Conference on Computer Vision II, 408--415 (2001).]]Google ScholarGoogle Scholar
  8. Edward Chang, Kingshy Goh, Gerard Sychay and Gang Wu, CBSA: content-based soft annotation for Multimodal Image Retrieval Using Bayes Point Machines. IEEE Transactions on Circuits and Systems for Video Technology Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description 13, 26--38 (2003).]]Google ScholarGoogle Scholar
  9. K. Barnard, P. Duygulu and D. Forsyth, Clustering art. In Proc of IEEE Computer Vision and Pattern Recognition 434--441 (2001).]]Google ScholarGoogle Scholar
  10. S. Belongie, C. Carson, H. Greenspan and J. Malik, Recognition of images in large databases using a learning framework. Technical report 07-939, UC Berkelely CS Tech Report 07-939, (1997).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Carson, M. Thomas, S. B., J. M. Hellerstein and J. Malik, BlobWorld: A System for region-based image indexing and Retrieval. Int Conf Visual Inf Sys (1999).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Edward Chang, Kingshy Goh, Gerard Sychay and Gang Wu, CBSA: content-based soft annotation for Multimodal Image Retrieval Using Bayes Point Machines. IEEE Transactions on Circuits and Systems for Video Technology Special Issue on Conceptual and Dynamical Aspects of Multimedia Content Description 13, 26--38 (2003).]]Google ScholarGoogle Scholar
  13. R. Herbrick, T. Graepel and C. Campbell, Bayes Point Machines. Journal of Machine Learning Research 1, 245--279 (2001).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. James Z. Wang and Jia Li. Learning-based Linguistic Indexing of Pictures with 2-D MHHMs. The 10th ACM Int. Conference on Multimedia, 436--445. 2002. The 10th ACM Int. Conference on Multimedia.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Collins and Y. Singer. Unsupervised Models for Name Entity Classification. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural language Processing and Very Large Corpora. 1999. In Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural language Processing and Very Large Corpora.]]Google ScholarGoogle Scholar
  16. I. Muslea, S. Minton and C. A. Knoblock, Selective sampling with co-testing. in CRM workshop on Combining and Selecting Multiple Models with Machine Learning (2000).]]Google ScholarGoogle Scholar
  17. K. Nigam and R. Ghani. Analyzing the Effectiveness and Applicability of Co-training. 2000. In Proceedings of the 9th International Coference on Information and Knowledge management.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Cao, H. Li and L. Lian, Uncertainty reduction in collaborative bootstapping:measure and algorithm. In proceeding of the 41th Annual Meeting of the Association for computational Linguistics (2003).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. D Lewis and W. A. Gale, A sequential algorithm for training text classifers. In proceeding of ACM SIGIR 3--12 (1994).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cha Zhang and Tsuhan Chen, An active learning framework for content-based information retrieval. IEEE transactions on multimedia 4, 260--268 (2002).]]Google ScholarGoogle Scholar
  21. Y. Deng and B. S. Manjunath, Unsupervised segmentation of color-texture regions in images and video. IEEE Trans on Pattern Analysis and Machine Intelligence 23, 800--810 (2001).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Tat_Seng Chua and Jimin Liu, Learning pattern rules for Chinese named-entity extraction. AAAI'2002. AAAI'2002 411--418 (2002).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ross Quinlan. Data Mining Tools See5 and C5.0. http://www.rulequest.com/see5-info.html. 2003.]]Google ScholarGoogle Scholar
  24. Vladimir Vapnik, The nature of Statistical Learning Theory, Springer, New York 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. John C. Platt, Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. In Advances in Large Margin Classifiers MIT Press (1999).]]Google ScholarGoogle Scholar

Index Terms

  1. A bootstrapping approach to annotating large image collection

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval
            November 2003
            281 pages
            ISBN:1581137788
            DOI:10.1145/973264

            Copyright © 2003 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 November 2003

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Upcoming Conference

            MM '24
            MM '24: The 32nd ACM International Conference on Multimedia
            October 28 - November 1, 2024
            Melbourne , VIC , Australia

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader