Abstract
In this paper a problem of inter-rater agreement is discussed in the case of human observers who judge how similar pairs of images are. In such a case significant differences in judgment appear among the group of people. We have observed that for some pairs of images all values of similarity ratings are assigned by various people with approximately the same probability. To investigate this phenomenon in a more thorough mannerwe performed experiments in which inter-rater coefficients were used to measure the level of agreement for each given pair of images and for each pair of human judges. The results obtained in the experiments suggest that the variation of the level of agreement is considerable among pairs of images as well as among pairs of people. We suggest that this effect should be taken into account in design of computer systems using image similarity as a criterion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
All images used in the experiments at the AI research group website (2010), http://ai.ii.pwr.wroc.pl/similaris/pic/PWR-SimilarImages/
Flickr - photo sharing (2010), http://www.flickr.com/
Berry, K., Johnston, J., Mielke Jr., P.: Weighted kappa for multiple raters. Percept Mot Skills 107(3), 837–848 (2008)
Bland, J.M., Altman, D.: Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet 327(8476), 307–310 (1986)
Cheng, S.C., Wu, T.L.: Fast indexing method for image retrieval using k nearest neighbors searches by principal axis analysis. Journal of Visual Communication and Image Representation 17(1), 42–56 (2006)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Cohen, J.: Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70(4), 213–220 (1968)
Downie, J.S., Ehmann, A.F., Bay, M., Jones, M.C.: The music information retrieval evaluation exchange: Some observations and insights. In: Raś, Z.W., Wieczorkowska, A.A. (eds.) Advances in Music Information Retrieval. SCI, vol. 274, pp. 93–115. Springer, Heidelberg (2010)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378–382 (1971)
Fleiss, J.L., Levin, B., Paik, M.: Statistical Methods for Raters and Proportions, 3rd edn. Wiley and Sons, Chichester (2003)
Geertzen, J., Bunt, H.: Measuring annotator agreement in a complex hierarchical dialogue act annotation scheme. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, Association for Computational Linguistics, Sydney, Australia, pp. 126–133 (2006)
Jakobsen, K.D., Frederiksen, J.N., Hansen, T., Jansson, L.B., Parnas, J., Werge, T.: Reliability of clinical ICD-10 schizophrenia diagnoses. Nordic Journal of Psychiatry 59(3), 209–212 (2005)
Kherfi, M.L., Ziou, D., Bernardi, A.: Combining positive and negative examples in relevance feedback for content-based image retrieval. Journal of Visual Communication and Image Representation 14(4), 428–457 (2003)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Ptaszynski, M., Maciejewski, J., Dybala, P., Rzepka, R., Araki, K.: CAO: A fully automatic emoticon analysis system. In: Fox, M., Poole, D. (eds.) AAAI. AAAI Press, Menlo Park (2010)
Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Family Medicine 37(5), 360–363 (2005)
Yoon, T., Chavarra, R., Cole, J., Hasegawa-johnson, M.: Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In: Proc. ICSLP (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Michalak, K., Dzieńkowski, B., Hudyma, E., Stanek, M. (2011). Analysis of Inter-rater Agreement among Human Observers Who Judge Image Similarity. In: Burduk, R., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Computer Recognition Systems 4. Advances in Intelligent and Soft Computing, vol 95. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20320-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-20320-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20319-0
Online ISBN: 978-3-642-20320-6
eBook Packages: EngineeringEngineering (R0)