research-article

Performance measures for multilabel evaluation: a case study in the area of image classification

Authors:
Stefanie Nowak

Fraunhofer IDMT, Ilmenau, Germany

Fraunhofer IDMT, Ilmenau, Germany
View Profile

,
Hanna Lukashevich

Fraunhofer IDMT, Ilmenau, Germany

Fraunhofer IDMT, Ilmenau, Germany
View Profile

,
Peter Dunker

Gracenote, Inc, Emeryville, CA, USA

Gracenote, Inc, Emeryville, CA, USA
View Profile

,
Stefan Rüger

Open University, Milton Keynes, England UK

Open University, Milton Keynes, England UK
View Profile

MIR '10: Proceedings of the international conference on Multimedia information retrievalMarch 2010Pages 35–44https://doi.org/10.1145/1743384.1743398

Published:29 March 2010Publication History

MIR '10: Proceedings of the international conference on Multimedia information retrieval

Pages 35–44

ABSTRACT

With the steadily increasing amount of multimedia documents on the web and at home, the need for reliable semantic indexing methods that assign multiple keywords to a document grows. The performance of existing approaches is often measured with standard evaluation measures of the information retrieval community. In a case study on image annotation, we show the behaviour of 13 different evaluation measures and point out their strengths and weaknesses. For the analysis, data from 19 research groups that participated in the ImageCLEF Photo Annotation Task are utilized together with several configurations based on random numbers. A recently proposed ontology-based measure was investigated that incorporates structure information, relationships from the ontology and the agreement between annotators for a concept and compared to a hierarchical variant. The results for the hierarchical measure are not competitive. The ontology-based results assign good scores to the systems that got also good ranks in the other measures like the example-based F-measure. For concept-based evaluation, stable results could be obtained for MAP concerning random numbers and the number of annotated labels. The AUC measure shows good evaluation characteristics in case all annotations contain confidence values.

References

A. Bernstein, E. Kaufmann, C. Bürki, and M. Klein. How similar is it? Towards personalized similarity measures in ontologies. In 7th Intern. Conference Wirtschaftsinformatik, Germany. Springer, 2005.Google ScholarCross Ref
A. Binder and M. Kawanabe. Fraunhofer FIRST' Submission to ImageCLEF2009 Photo Annotation Task: Non-sparse Multiple Kernel Learning. CLEF working notes, 2009.Google Scholar
H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon, and J. Struyf. Hierarchical multi-classification. In SIGKDD Workshop on Multi-Relational Data Mining, pages 21--35, 2002.Google Scholar
N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7:31--54, 2006. Google ScholarDigital Library
B. Daroczy, I. Petras, A. Benczur, Z. Fekete, D. Nemeskey, D. Siklosi, and Z. Weiner. SZTAKI @ ImageCLEF 2009. CLEF working notes, 2009.Google Scholar
M. Douze, M. Guillaumin, T. Mensink, C. Schmid, and J. Verbeek. INRIA-LEARs participation to ImageCLEF 2009. CLEF working notes, 2009.Google Scholar
H. Escalante, J. Gonzalez, C. Hernandez, A. Lopez, M. Montex, E. Morales, E. Ruiz, L. Sucar, and L. Villasenor. TIA-INAOE's Participation at ImageCLEF 2009. CLEF working notes, 2009.Google Scholar
A. Fakeri-Tabrizi, S. Tollari, L. Denoyer, and P. Gallinari. UPMC/LIP6 at ImageCLEF annotation >2009: Large Scale Visual Concept Detection and Annotation. CLEF working notes, 2009.Google Scholar
J. Fan, Y. Gao, H. Luo, and R. Jain. Mining multilevel image semantics via hierarchical classification. IEEE Trans. on Multimedia, 10(2):167, 2008. Google ScholarDigital Library
M. Ferecatu and H. Sahbi. TELECOM ParisTech at ImageClef 2009: Large Scale Visual Concept Detection and Annotation Task. CLEF working notes, 2009.Google Scholar
A. Freitas and A. de Carvalho. A tutorial on hierarchical classification with applications in bioinformatics. Intelligent Information Technologies: Concepts, Methodologies, Tools and Applications, 2007.Google Scholar
H. Glotin, A. Fakeri-Tabrizi, P. Mulhem, M. Ferecatu, Z. Zhao, S. Tollari, G. Quenot, H. Sahbi, E. Dumont, and P. Gallinari. Comparison of Various AVEIR Visual Concept Detectors with an Index of Carefulness. CLEF working notes, 2009.Google Scholar
J. Hare and P. Lewis. IAM@ImageCLEF Photo Annotation 2009: Naive application of a linear algebraic semantic space. CLEF working notes, 2009.Google Scholar
M. J. Huiskes and M. S. Lew. The MIR Flickr Retrieval Evaluation. In MIR '08: Proceedings of the 2008 ACM Intern. Conf. on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
A. Iftene, L. Vamanu, and C. Croitoru. UAIC at ImageCLEF 2009 Photo Annotation Task. CLEF working notes, 2009. Google ScholarDigital Library
Y. Liu and E. Shriberg. Comparing evaluation metrics for sentence boundary detection. In Intern. Conf. on Acoustics, Speech and Signal Processing, 2007.Google ScholarCross Ref
A. Llorente, S. Little, and S. Rüger. MMIS at ImageCLEF 2009: Non-parametric Density Estimation Algorithms. CLEF working notes, 2009.Google Scholar
P. Lord, R. Stevens, A. Brass, and C. Goble. Investigating semantic similarity measures across the Gene Ontology: The relationship between sequence and annotation. volume 19. Oxford Univ Press, 2003.Google ScholarCross Ref
D. Lowe. Object recognition from local scale-invariant features. In Intern. Conf. on Computer Vision, volume 2, pages 1150--1157. Corfu, Greece, 1999. Google ScholarDigital Library
C. Manning, P. Raghavan, and H. Schütze. An Introduction to Information Retrieval {Draft}. Cambridge, UK: Cambridge University Press, April 2009. http://www.informationretrieval.org/. Google ScholarDigital Library
P. Mulhem, J.-P. Chevallet, G. Quenon, and R. Al Batal. MRIM-LIG at ImageCLEF 2009: Photo Retrieval and Photo Annotation tasks. CLEF working notes, 2009.Google Scholar
J. Ngiam and H. Goh. I2R ImageCLEF Photo Annotation 2009 Working Notes. CLEF working notes, 2009.Google Scholar
S. Nowak and P. Dunker. A Consumer Photo Tagging Ontology: Concepts and Annotations. In THESEUS/ImageCLEF Pre-Workshop, 2009.Google Scholar
S. Nowak and P. Dunker. Overview of the CLEF 2009 Large-Scale Visual Concept Detection and Annotation Task. CLEF working notes, 2009. Google ScholarDigital Library
S. Nowak and H. Lukashevich. Multilabel Classification Evaluation using Ontology Information. In Proc. of IRMLeS Workshop, ESWC, 2009.Google Scholar
P. Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of artificial intelligence research, 1999.Google Scholar
S. Sarin and W. Kameyama. Joint Contribution of Global and Local Features for Image Annotation. CLEF working notes, 2009.Google Scholar
X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In Intern. Symp. on Electronic Imaging, San Jose, CA, 2004.Google Scholar
G. Tsoumakas and I. Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. Lecture Notes in Computer Science, 4701:406, 2007. Google ScholarDigital Library
K. van de Sande, T. Gevers, and A. Smeulders. The University of Amsterdam's Concept Detection System at ImageCLEF 2009. CLEF working notes, 2009. Google ScholarDigital Library
Z.-Q. Zhao, H. Glotin, and E. Dumont. LSIS Scale Photo Annotations: Discriminant Features SVM versus Visual Dictionary based on Image Frequency. CLEF working notes, 2009.Google Scholar

Index Terms

Performance measures for multilabel evaluation: a case study in the area of image classification

Recommendations

The effect of semantic relatedness measures on multi-label classification evaluation
CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval

In this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures ...
Read More
Characterization and evaluation of similarity measures for pairs of clusterings

In evaluating the results of cluster analysis, it is common practice to make use of a number of fixed heuristics rather than to compare a data clustering directly against an empirically derived standard, such as a clustering empirically obtained from ...
Read More
An empirical evaluation of similarity measures for time series classification

Time series are ubiquitous, and a measure to assess their similarity is a core part of many computational systems. In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems. Because of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MIR '10: Proceedings of the international conference on Multimedia information retrieval
March 2010
600 pages
ISBN:9781605588155
DOI:10.1145/1743384
General Chairs:
James Z. Wang
The Pennsylvania State University, USA
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Nuria Oliver Ramirez
Telefonica Research, Spain
,
Apostol Natsev
IBM Research, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
benchmark
evaluation
image annotation
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 650
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Performance measures for multilabel evaluation: a case study in the area of image classification

MIR '10: Proceedings of the international conference on Multimedia information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

The effect of semantic relatedness measures on multi-label classification evaluation

Characterization and evaluation of similarity measures for pairs of clusterings

An empirical evaluation of similarity measures for time series classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Performance measures for multilabel evaluation: a case study in the area of image classification

MIR '10: Proceedings of the international conference on Multimedia information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

The effect of semantic relatedness measures on multi-label classification evaluation

Characterization and evaluation of similarity measures for pairs of clusterings

An empirical evaluation of similarity measures for time series classification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media