ABSTRACT
With the steadily increasing amount of multimedia documents on the web and at home, the need for reliable semantic indexing methods that assign multiple keywords to a document grows. The performance of existing approaches is often measured with standard evaluation measures of the information retrieval community. In a case study on image annotation, we show the behaviour of 13 different evaluation measures and point out their strengths and weaknesses. For the analysis, data from 19 research groups that participated in the ImageCLEF Photo Annotation Task are utilized together with several configurations based on random numbers. A recently proposed ontology-based measure was investigated that incorporates structure information, relationships from the ontology and the agreement between annotators for a concept and compared to a hierarchical variant. The results for the hierarchical measure are not competitive. The ontology-based results assign good scores to the systems that got also good ranks in the other measures like the example-based F-measure. For concept-based evaluation, stable results could be obtained for MAP concerning random numbers and the number of annotated labels. The AUC measure shows good evaluation characteristics in case all annotations contain confidence values.
- A. Bernstein, E. Kaufmann, C. Bürki, and M. Klein. How similar is it? Towards personalized similarity measures in ontologies. In 7th Intern. Conference Wirtschaftsinformatik, Germany. Springer, 2005.Google ScholarCross Ref
- A. Binder and M. Kawanabe. Fraunhofer FIRST' Submission to ImageCLEF2009 Photo Annotation Task: Non-sparse Multiple Kernel Learning. CLEF working notes, 2009.Google Scholar
- H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon, and J. Struyf. Hierarchical multi-classification. In SIGKDD Workshop on Multi-Relational Data Mining, pages 21--35, 2002.Google Scholar
- N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7:31--54, 2006. Google ScholarDigital Library
- B. Daroczy, I. Petras, A. Benczur, Z. Fekete, D. Nemeskey, D. Siklosi, and Z. Weiner. SZTAKI @ ImageCLEF 2009. CLEF working notes, 2009.Google Scholar
- M. Douze, M. Guillaumin, T. Mensink, C. Schmid, and J. Verbeek. INRIA-LEARs participation to ImageCLEF 2009. CLEF working notes, 2009.Google Scholar
- H. Escalante, J. Gonzalez, C. Hernandez, A. Lopez, M. Montex, E. Morales, E. Ruiz, L. Sucar, and L. Villasenor. TIA-INAOE's Participation at ImageCLEF 2009. CLEF working notes, 2009.Google Scholar
- A. Fakeri-Tabrizi, S. Tollari, L. Denoyer, and P. Gallinari. UPMC/LIP6 at ImageCLEF annotation >2009: Large Scale Visual Concept Detection and Annotation. CLEF working notes, 2009.Google Scholar
- J. Fan, Y. Gao, H. Luo, and R. Jain. Mining multilevel image semantics via hierarchical classification. IEEE Trans. on Multimedia, 10(2):167, 2008. Google ScholarDigital Library
- M. Ferecatu and H. Sahbi. TELECOM ParisTech at ImageClef 2009: Large Scale Visual Concept Detection and Annotation Task. CLEF working notes, 2009.Google Scholar
- A. Freitas and A. de Carvalho. A tutorial on hierarchical classification with applications in bioinformatics. Intelligent Information Technologies: Concepts, Methodologies, Tools and Applications, 2007.Google Scholar
- H. Glotin, A. Fakeri-Tabrizi, P. Mulhem, M. Ferecatu, Z. Zhao, S. Tollari, G. Quenot, H. Sahbi, E. Dumont, and P. Gallinari. Comparison of Various AVEIR Visual Concept Detectors with an Index of Carefulness. CLEF working notes, 2009.Google Scholar
- J. Hare and P. Lewis. IAM@ImageCLEF Photo Annotation 2009: Naive application of a linear algebraic semantic space. CLEF working notes, 2009.Google Scholar
- M. J. Huiskes and M. S. Lew. The MIR Flickr Retrieval Evaluation. In MIR '08: Proceedings of the 2008 ACM Intern. Conf. on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- A. Iftene, L. Vamanu, and C. Croitoru. UAIC at ImageCLEF 2009 Photo Annotation Task. CLEF working notes, 2009. Google ScholarDigital Library
- Y. Liu and E. Shriberg. Comparing evaluation metrics for sentence boundary detection. In Intern. Conf. on Acoustics, Speech and Signal Processing, 2007.Google ScholarCross Ref
- A. Llorente, S. Little, and S. Rüger. MMIS at ImageCLEF 2009: Non-parametric Density Estimation Algorithms. CLEF working notes, 2009.Google Scholar
- P. Lord, R. Stevens, A. Brass, and C. Goble. Investigating semantic similarity measures across the Gene Ontology: The relationship between sequence and annotation. volume 19. Oxford Univ Press, 2003.Google ScholarCross Ref
- D. Lowe. Object recognition from local scale-invariant features. In Intern. Conf. on Computer Vision, volume 2, pages 1150--1157. Corfu, Greece, 1999. Google ScholarDigital Library
- C. Manning, P. Raghavan, and H. Schütze. An Introduction to Information Retrieval {Draft}. Cambridge, UK: Cambridge University Press, April 2009. http://www.informationretrieval.org/. Google ScholarDigital Library
- P. Mulhem, J.-P. Chevallet, G. Quenon, and R. Al Batal. MRIM-LIG at ImageCLEF 2009: Photo Retrieval and Photo Annotation tasks. CLEF working notes, 2009.Google Scholar
- J. Ngiam and H. Goh. I2R ImageCLEF Photo Annotation 2009 Working Notes. CLEF working notes, 2009.Google Scholar
- S. Nowak and P. Dunker. A Consumer Photo Tagging Ontology: Concepts and Annotations. In THESEUS/ImageCLEF Pre-Workshop, 2009.Google Scholar
- S. Nowak and P. Dunker. Overview of the CLEF 2009 Large-Scale Visual Concept Detection and Annotation Task. CLEF working notes, 2009. Google ScholarDigital Library
- S. Nowak and H. Lukashevich. Multilabel Classification Evaluation using Ontology Information. In Proc. of IRMLeS Workshop, ESWC, 2009.Google Scholar
- P. Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of artificial intelligence research, 1999.Google Scholar
- S. Sarin and W. Kameyama. Joint Contribution of Global and Local Features for Image Annotation. CLEF working notes, 2009.Google Scholar
- X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In Intern. Symp. on Electronic Imaging, San Jose, CA, 2004.Google Scholar
- G. Tsoumakas and I. Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. Lecture Notes in Computer Science, 4701:406, 2007. Google ScholarDigital Library
- K. van de Sande, T. Gevers, and A. Smeulders. The University of Amsterdam's Concept Detection System at ImageCLEF 2009. CLEF working notes, 2009. Google ScholarDigital Library
- Z.-Q. Zhao, H. Glotin, and E. Dumont. LSIS Scale Photo Annotations: Discriminant Features SVM versus Visual Dictionary based on Image Frequency. CLEF working notes, 2009.Google Scholar
Index Terms
- Performance measures for multilabel evaluation: a case study in the area of image classification
Recommendations
The effect of semantic relatedness measures on multi-label classification evaluation
CIVR '10: Proceedings of the ACM International Conference on Image and Video RetrievalIn this paper, we explore different ways of formulating new evaluation measures for multi-label image classification when the vocabulary of the collection adopts the hierarchical structure of an ontology. We apply several semantic relatedness measures ...
Characterization and evaluation of similarity measures for pairs of clusterings
In evaluating the results of cluster analysis, it is common practice to make use of a number of fixed heuristics rather than to compare a data clustering directly against an empirically derived standard, such as a clustering empirically obtained from ...
An empirical evaluation of similarity measures for time series classification
Time series are ubiquitous, and a measure to assess their similarity is a core part of many computational systems. In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems. Because of ...
Comments