Skip to main content
Log in

The ClasSi coefficient for the evaluation of ranking quality in the presence of class similarities

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Evaluation measures play an important role in the design of new approaches, and often quality is measured by assessing the relevance of the obtained result set. While many evaluation measures based on precision/recall are based on a binary relevance model, ranking correlation coefficients are better suited for multi-class problems. State-of-the-art ranking correlation coefficients like Kendall’s τ and Spearman’s ρ do not allow the user to specify similarities between differing object classes and thus treat the transposition of objects from similar classes the same way as that of objects from dissimilar classes. We propose ClasSi, a new ranking correlation coefficient which deals with class label rankings and employs a class distance function to model the similarities between the classes. We also introduce a graphical representation of ClasSi which describes how the correlation evolves throughout the ranking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. van Rijsbergen C J. Information Retrieval. 2nd ed. London: Butterworth-Heinemann, 1979

    Google Scholar 

  2. Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008

    Book  MATH  Google Scholar 

  3. Flach P A, Blockeel H, Ferri C, Hernández-Orallo J, Struyf J. Decision support for data mining; introduction to ROC analysis and its applications. In: Mladenic D, Lavračn, Bohanec M, Moyle S, eds. Data Mining and Decision Support: Integration and Collaboration. Boston: Kluwer Academic Publishers, 2003, 81–90

    Chapter  Google Scholar 

  4. Hand D J, Till R J. A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45(2): 171–186

    Article  MATH  Google Scholar 

  5. Ferri C, Hernández-Orallo J, Salido M A. Volume under the ROC surface for multi-class problems. In: Proceedings of the 14th European Conference on Machine Learning. 2003, 108–120

  6. Hassan M R, Ramamohanarao K, Karmakar C K, Hossain M M, Bailey J. A novel scalable multi-class ROC for effective visualization and computation. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part I. 2010, 107–120

  7. Kendall M. A new measure of rank correlation. Biometrika, 1938, 30(1–2): 81–89

    MathSciNet  MATH  Google Scholar 

  8. Spearman C. The proof and measurement of association between two things. The American Journal of Psychology, 1987, 100(3/4): 441–471

    Article  Google Scholar 

  9. Kendall M, Gibbons J D. Rank Correlation Methods. London: Edward Arnold, 1990

    MATH  Google Scholar 

  10. Goodman L A, Kruskal W H. Measures of association for cross classifications. Journal of the American Statistical Association, 1954, 49(268): 732–764

    MATH  Google Scholar 

  11. Somers R H. A new asymmetric measure of association for ordinal variables. American Sociological Review, 1962, 27(6): 799–811

    Article  Google Scholar 

  12. Ivanescu A, Wichterich M, Seidl T. ClasSi: measuring ranking quality in the presence of object classes with similarity information. In: Proceedings of PAKDD 2011 Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models Workshop. 2011, 185–196

  13. Beecks C, Uysal M S, Seidl T. Signature quadratic form distance. In: Proceedings of the 2010 ACM International Conference on Image and Video Retrieval. 2010, 438–445

  14. Rubner Y, Tomasi C, Guibas L J. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 2000, 40(2): 99–121

    Article  MATH  Google Scholar 

  15. Wang J Z, Li J, Wiederhold G. Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(9): 947–963

    Article  Google Scholar 

  16. van de Sande K E A, Gevers T, Snoek C G M. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1582–1596

    Article  Google Scholar 

  17. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10–18

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anca Maria Ivanescu.

Additional information

Anca Maria Ivanescu received her Master in Computer Science in January 2009 from RWTH Aachen University, Germany, and is currently a PhD student at the data management and data exploration group. Her research interests include distance-based similarity search and evaluation measures, as well as model-based learning.

Marc Wichterich received his Master’s degree and PhD in Computer Science from RWTH Aachen University, Germany, in 2005 and 2010, respectively. In line with his interest in data mining and similarity search, he is currently working on matching third-party item descriptions to existing catalog entries at amazon.com.

Christian Beecks is a PhD student in Computer Science in the data management and data exploration group at RWTH Aachen University, Germany. His research interests include efficient and effective content-based multimedia retrieval and exploration, and adaptive distance-based similarity measures. His current particular research interest is devoted to the signature quadratic form distance.

Thomas Seidl is a professor of Computer Science and head of the data management and data exploration group at RWTH Aachen University, Germany. His research interests include data mining and database technology for multimedia and spatio-temporal databases in engineering, communication, and life science applications. Prof. Seidl received his Diplom (MSc) in 1992 from TU Muenchen and his PhD (1997) and venia legendi (2001) from LMU Muenchen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ivanescu, A.M., Wichterich, M., Beecks, C. et al. The ClasSi coefficient for the evaluation of ranking quality in the presence of class similarities. Front. Comput. Sci. 6, 568–580 (2012). https://doi.org/10.1007/s11704-012-1175-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-012-1175-2

Keywords

Navigation