Evaluation of Localized Semantics: Data, Methodology, and Experiments

Barnard, Kobus; Fan, Quanfu; Swaminathan, Ranjini; Hoogs, Anthony; Collins, Roderic; Rondot, Pascale; Kaufhold, John

doi:10.1007/s11263-007-0068-6

Evaluation of Localized Semantics: Data, Methodology, and Experiments

Published: 07 August 2007

Volume 77, pages 199–217, (2008)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Kobus Barnard¹,
Quanfu Fan¹,
Ranjini Swaminathan¹,
Anthony Hoogs²,
Roderic Collins²,
Pascale Rondot³ &
…
John Kaufhold⁴

192 Accesses
20 Citations
Explore all metrics

Abstract

We present a new data set of 1014 images with manual segmentations and semantic labels for each segment, together with a methodology for using this kind of data for recognition evaluation. The images and segmentations are from the UCB segmentation benchmark database (Martin et al., in International conference on computer vision, vol. II, pp. 416–421, 2001). The database is extended by manually labeling each segment with its most specific semantic concept in WordNet (Miller et al., in Int. J. Lexicogr. 3(4):235–244, 1990). The evaluation methodology establishes protocols for mapping algorithm specific localization (e.g., segmentations) to our data, handling synonyms, scoring matches at different levels of specificity, dealing with vocabularies with sense ambiguity (the usual case), and handling ground truth regions with multiple labels. Given these protocols, we develop two evaluation approaches. The first measures the range of semantics that an algorithm can recognize, and the second measures the frequency that an algorithm recognizes semantics correctly. The data, the image labeling tool, and programs implementing our evaluation strategy are all available on-line (kobus.ca//research/data/IJCV_2007).

We apply this infrastructure to evaluate four algorithms which learn to label image regions from weakly labeled data. The algorithms tested include two variants of multiple instance learning (MIL), and two generative multi-modal mixture models. These experiments are on a significantly larger scale than previously reported, especially in the case of MIL methods. More specifically, we used training data sets up to 37,000 images and training vocabularies of up to 650 words.

We found that one of the mixture models performed best on image annotation and the frequency correct measure, and that variants of MIL gave the best semantic range performance. We were able to substantively improve the performance of MIL methods on the other tasks (image annotation and frequency correct region labeling) by providing an appropriate prior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Article Open access 21 March 2016

Microsoft COCO: Common Objects in Context

References

Agarwal, S., Awan, A., & Roth, D. (2002). The UIUC image database for car detection. Available from http://l2r.cs.uiuc.edu/~cogcomp/Data/Car/.
Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490.
Article Google Scholar
Agirre, E., & Rigau, G. (1995). A proposal for word sense disambiguation using conceptual distance. In 1st international conference on recent advances in natural language processing, Velingrad.
Andrews, S., & Hofmann, T. (2004). Multiple instance learning via disjunctive programming boosting. In Advances in neural information processing systems (NIPS 16).
Andrews, S., Hofmann, T., & Tsochantaridis, I. (2002a). Multiple instance learning with generalized support vector machines. Menlo Park: AAAI.
Google Scholar
Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002b). Advances in neural information processing systems: Vol. 15. Support vector machines for multiple-instance learning, Vancouver, BC.
Barnard, K., & Forsyth, D. (2001). Learning the semantics of words and pictures. In International conference on computer vision (Vol. II, pp. 408–415).
Barnard, K., Duygulu, P., & Forsyth, D. (2001). Clustering art. In IEEE conference on computer vision and pattern Recognition (Vol. II, pp. 434–441), Hawaii.
Barnard, K. et al. (2003a). Matching Words and Pictures. Journal of Machine Learning Research, 3, 1107–1135.
Article MATH Google Scholar
Barnard, K., Duygulu, P., Raghavendra, K. G., Gabbur, P., & Forsyth, D. (2003b). The effects of segmentation and feature choice in a translation model of object recognition. In IEEE conference on computer vision and pattern recognition (Vol. II, pp. 675–682), Madison, WI.
Berg, A. C., Berg, T. L., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondence. In CVPR.
Carbonetto, P., Freitas, N. D., & Barnard, K. (2004). A statistical model for general contextual object recognition. In European conference on computer vision.
Chang, C.-C., & Lin, C.-J. (2001). LIBSVM—a library for support vector machines. Available from http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Chen, Y., & Wang, J. Z. (2004). Image categorization by learning and reasoning with regions. Journal of Machine Learning Research, 5, 913–939.
Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 1–38.
MATH MathSciNet Google Scholar
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In Workshop on generative-model based vision. Washington, DC.
Fellbaum, C., Miller, P. G. A., Tengi, R., & Wakefield, P. (1998). WordNet—a lexical database for English. Available from http://www.cogsci.princeton.edu/~wn.
Fergus, R., & Perona, P. (1998). The Caltech database. Available from http://www.vision.caltech.edu/html-files/archive.html.
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE conference on computer vision and pattern recognition. Madison, WI.
Gabbur, P. (2003). Quantitative evaluation of feature sets, segmentation algorithms, and color constancy algorithms using word prediction. Masters Thesis, University of Arizona, Tucson, AZ.
Gale, W., Church, K., & Yarowsky, D. (1992). One sense per discourse. In DARPA workshop on speech and natural language (pp. 233–237). New York.
Jeon, J., Lavrenko, V., & Manmatha, R. (2003). Automatic image annotation and retrieval using cross-media relevance models. In SIGIR.
Jonker, R., & Volgenant, A. (1987). A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 38, 325–340.
Article MATH MathSciNet Google Scholar
Karov, Y., & Edelman, S. (1998). Similarity-based word sense disambiguation. Computational Linguistics, 24(1), 41–59.
Google Scholar
Lavrenko, V., Manmatha, R., & Jeon, J. (2003). A model for learning the semantics of pictures. In NIPS.
Leibe, B., & Schiele, B. (2003). The TU Darmstadt database. Available from http://www.vision.ethz.ch/leibe/data/.
Maron, O. (1998). Learning from ambiguity. PhD thesis, Massachusetts Institute of Technology.
Maron, O., & Lozano-Perez, T. (1998). A framework for multiple-instance learning. In Neural information processing systems. Cambridge: MIT Press.
Google Scholar
Maron, O., & Ratan, A. L. (1998). Multiple-instance learning for natural scene classification. In The fifteenth international conference on machine learning.
Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In International conference on computer vision (Vol. II, pp. 416–421).
Mihalcea, R., & Moldovan, D. (1998). Word sense disambiguation based on semantic density. In COLING/ACL workshop on usage of wordnet in natural language processing systems. Montreal.
Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: an on-line lexical database. International Journal of Lexicography, 3(4), 235–244.
Article Google Scholar
Opelt, A., & Pinz, A. (2004). TU Graz-02 database. Available from http://www.emt.tugraz.at/~pinz/data/GRAZ_02/.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(9), 888–905.
Google Scholar
Tao, Q., & Scott, S. (2004). A faster algorithm for generalized multiple-instance learning. In Seventeenth annual FLAIRS conference (pp. 550–555), Miami Beach, FL.
Tao, Q., Scott, S., Vinodchandran, N. V., Osugi, T. T., & Mueller, B. (2004a). An extended kernel for generalized multiple-instance learning. In IEEE international conference on tools with artificial intelligence.
Tao, Q., Scott, S. D., & Vinodchandran, N. V. (2004b). SVM-based generalized multiple-instance learning via approximate box counting. In International conference on machine learning (pp. 779–806). Banff, AB, Canada.
Torralba, A., Murphy, K. P., & Freeman, W. T. (2003). The MIT-CSAIL database of objects and scenes. Available from http://web.mit.edu/torralba/www/database.html.
Torralba, A., Murphy, K. P., & Freeman, W. T. (2004). Sharing features: efficient boosting procedures for multiclass object detection. In IEEE conference on computer vision and pattern recognition (Vol. II, pp. 762–769), Washington, DC.
Traupman, J., & Wilensky, R. (2003). Experiments in improving unsupervised word sense disambiguation. Computer Science Division, University of California Berkeley.
Vivarelli, F., & Williams, C. K. I. (1997). Using Bayesian neural networks to classify segmented images. In IEE international conference on artificial neural networks.
Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In D. Vernon (Ed.), 6th European conference on computer vision (pp. 18–32).
Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In 33rd conference on applied natural language processing. Cambridge: ACL.
Google Scholar
Zhang, Q., & Goldman, S. A. (2001). EM-DD: an improved multiple-instance learning technique. In Neural information processing Systems.

Download references

Author information

Authors and Affiliations

Computer Science Department, The University of Arizona, P.O. Box 210077, Tucson, AZ, 85721-0077, USA
Kobus Barnard, Quanfu Fan & Ranjini Swaminathan
GE Global Research, One Research Circle, Schenectady, NY, 12309, USA
Anthony Hoogs & Roderic Collins
Aeronautics, Lockheed Martin Corp., Lockheed Boulevard, Ft. Worth, TX, 76108, USA
Pascale Rondot
Advanced Concepts Business Unit, SAIC Corp., 1710 SAIC Drive, McLean, VA, 22102, USA
John Kaufhold

Authors

Kobus Barnard
View author publications
You can also search for this author in PubMed Google Scholar
Quanfu Fan
View author publications
You can also search for this author in PubMed Google Scholar
Ranjini Swaminathan
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Hoogs
View author publications
You can also search for this author in PubMed Google Scholar
Roderic Collins
View author publications
You can also search for this author in PubMed Google Scholar
Pascale Rondot
View author publications
You can also search for this author in PubMed Google Scholar
John Kaufhold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kobus Barnard.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barnard, K., Fan, Q., Swaminathan, R. et al. Evaluation of Localized Semantics: Data, Methodology, and Experiments. Int J Comput Vis 77, 199–217 (2008). https://doi.org/10.1007/s11263-007-0068-6

Download citation

Received: 12 September 2005
Accepted: 29 May 2007
Published: 07 August 2007
Issue Date: May 2008
DOI: https://doi.org/10.1007/s11263-007-0068-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of Localized Semantics: Data, Methodology, and Experiments

Abstract

Access this article

Similar content being viewed by others

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Microsoft COCO: Common Objects in Context

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of Localized Semantics: Data, Methodology, and Experiments

Abstract

Access this article

Similar content being viewed by others

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Joint Inference in Weakly-Annotated Image Datasets via Dense Correspondence

Microsoft COCO: Common Objects in Context

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation