Abstract
Evaluation measures play an important role in the design of new approaches, and often quality is measured by assessing the relevance of the obtained result set. While many evaluation measures based on precision/recall are based on a binary relevance model, ranking correlation coefficients are better suited for multi-class problems. State-of-the-art ranking correlation coefficients like Kendall’s τ and Spearman’s ρ do not allow the user to specify similarities between differing object classes and thus treat the transposition of objects from similar classes the same way as that of objects from dissimilar classes. We propose ClasSi, a new ranking correlation coefficient which deals with class label rankings and employs a class distance function to model the similarities between the classes. We also introduce a graphical representation of ClasSi which describes how the correlation evolves throughout the ranking.
Similar content being viewed by others
References
van Rijsbergen C J. Information Retrieval. 2nd ed. London: Butterworth-Heinemann, 1979
Manning C D, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008
Flach P A, Blockeel H, Ferri C, Hernández-Orallo J, Struyf J. Decision support for data mining; introduction to ROC analysis and its applications. In: Mladenic D, Lavračn, Bohanec M, Moyle S, eds. Data Mining and Decision Support: Integration and Collaboration. Boston: Kluwer Academic Publishers, 2003, 81–90
Hand D J, Till R J. A simple generalization of the area under the ROC curve for multiple class classification problems. Machine Learning, 2001, 45(2): 171–186
Ferri C, Hernández-Orallo J, Salido M A. Volume under the ROC surface for multi-class problems. In: Proceedings of the 14th European Conference on Machine Learning. 2003, 108–120
Hassan M R, Ramamohanarao K, Karmakar C K, Hossain M M, Bailey J. A novel scalable multi-class ROC for effective visualization and computation. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part I. 2010, 107–120
Kendall M. A new measure of rank correlation. Biometrika, 1938, 30(1–2): 81–89
Spearman C. The proof and measurement of association between two things. The American Journal of Psychology, 1987, 100(3/4): 441–471
Kendall M, Gibbons J D. Rank Correlation Methods. London: Edward Arnold, 1990
Goodman L A, Kruskal W H. Measures of association for cross classifications. Journal of the American Statistical Association, 1954, 49(268): 732–764
Somers R H. A new asymmetric measure of association for ordinal variables. American Sociological Review, 1962, 27(6): 799–811
Ivanescu A, Wichterich M, Seidl T. ClasSi: measuring ranking quality in the presence of object classes with similarity information. In: Proceedings of PAKDD 2011 Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models Workshop. 2011, 185–196
Beecks C, Uysal M S, Seidl T. Signature quadratic form distance. In: Proceedings of the 2010 ACM International Conference on Image and Video Retrieval. 2010, 438–445
Rubner Y, Tomasi C, Guibas L J. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 2000, 40(2): 99–121
Wang J Z, Li J, Wiederhold G. Simplicity: semantics-sensitive integrated matching for picture libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(9): 947–963
van de Sande K E A, Gevers T, Snoek C G M. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1582–1596
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10–18
Author information
Authors and Affiliations
Corresponding author
Additional information
Anca Maria Ivanescu received her Master in Computer Science in January 2009 from RWTH Aachen University, Germany, and is currently a PhD student at the data management and data exploration group. Her research interests include distance-based similarity search and evaluation measures, as well as model-based learning.
Marc Wichterich received his Master’s degree and PhD in Computer Science from RWTH Aachen University, Germany, in 2005 and 2010, respectively. In line with his interest in data mining and similarity search, he is currently working on matching third-party item descriptions to existing catalog entries at amazon.com.
Christian Beecks is a PhD student in Computer Science in the data management and data exploration group at RWTH Aachen University, Germany. His research interests include efficient and effective content-based multimedia retrieval and exploration, and adaptive distance-based similarity measures. His current particular research interest is devoted to the signature quadratic form distance.
Thomas Seidl is a professor of Computer Science and head of the data management and data exploration group at RWTH Aachen University, Germany. His research interests include data mining and database technology for multimedia and spatio-temporal databases in engineering, communication, and life science applications. Prof. Seidl received his Diplom (MSc) in 1992 from TU Muenchen and his PhD (1997) and venia legendi (2001) from LMU Muenchen.
Rights and permissions
About this article
Cite this article
Ivanescu, A.M., Wichterich, M., Beecks, C. et al. The ClasSi coefficient for the evaluation of ranking quality in the presence of class similarities. Front. Comput. Sci. 6, 568–580 (2012). https://doi.org/10.1007/s11704-012-1175-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-012-1175-2