Abstract
Similarity measures for binary variables are used in many problems of machine learning, pattern recognition and classification. Currently, the dozens of similarity measures are introduced and the problem of comparative analysis of these measures appears. One of the methods used for such analysis is clustering of similarity measures based on correlation between data similarity values obtained by different measures. The paper proposes the method of comparative analysis of similarity measures based on the set theoretic representation of these measures and comparison of algebraic properties of these representations. The results show existing relationship between results of clustering and the classification of measures by their properties. Due to the results of clustering depend on the clustering method and on data used for measuring correlation between measures we conclude that the classification based on the proposed properties of similarity measures is more suitable for comparative analysis of similarity measures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Batyrshin, I.: On definition and construction of association measures. J. Intell. Fuzzy Syst. 29, 2319–2326 (2015)
Batyrshin, I.Z., Kubysheva, N., Solovyev, V., Villa-Vargas, L.A.: Visualization of similarity measures for binary data and 2 × 2 tables. Computación y Sistemas 20(3), 345–353 (2016)
Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12(1), 73–90 (1995)
Baulieu, F.B.: A classification of presence/absence based dissimilarity coefficients. J. Classif. 6(1), 233–246 (1989)
Choi, S.S., Cha, S.H., Charles, C.T.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inf. 8, 43–48 (2010)
Clifford, H.T., Stephenson, W.: An Introduction to Numerical Classification, vol. 229. Academic Press, New York (1975)
Duarte, J.M., Santos, J.B.D., Melo, L.C.: Comparison of similarity coefficients based on RAPD markers in the common bean. Genet. Mol. Biol. 22(3), 427–432 (1999)
Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications. J. Am. Stat. Assoc. 49, 732–764 (1954)
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 857–871
Gower, J.C., Legendre, P.: Metric and Euclidean properties of dissimilarity coefficients. J. Classif. 3(1), 5–48 (1986)
Hassanat, A.B.: Dimensionality invariant similarity measure. J. Am. Sci. 221–226 (2014)
Johnston, J.W.: Similarity indices I: what do they measure? In: Energy Research and Development Administration, vol. 136 (1976)
Legendre, P., Legendre, L.F.: Numerical Ecology, 2nd edn. Elsevier, Amsterdam (1998)
Lesot, M.-J., Rifqi, M., Benhadda, H.: Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Paradig. 1(1), 63–84 (2009)
Meilă, M.: Comparing clusterings: an information based distance. J. Multivar. Anal. 98, 873–895 (2007)
Meyer, A.D.S., Garcia, A.A.F., Souza, A.P.D., Souza Jr., C.L.D.: Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L). Genet. Mol. Biol. 27(1), 83–91 (2004)
Pearson, K., Blakeman, J.: Mathematical contributions to the theory of evolution. In: 13th on the Theory of Contingency and Its Relation to Association and Normal Correlation. Dulau & Co., London (1912)
Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19, 361–394 (2009)
Rodríguez-Salazar, M.E., Álvarez-Hernández, S., Bravo-Núñez, E.: Coeficientes de asociación. Plaza y Valdés Editores, México (2001)
Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3), 491–504 (2014)
Sokal, R.R., Sneath, P.H.A.: Principles of Numerical Taxonomy. WH Freeman, New York (1963)
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)
Tversky, A.: Features of similarity. Psychol. Rev. 84, 327–352 (1977)
Warrens, M.J.: A comparison of multi-way similarity coefficients for binary sequences. Int. J. Res. Rev. Appl. Sci. 16(1), 12 (2013)
Acknowledgements
The work is partially supported by the projects SIP 20171344, BEIFI of IPN and 283778 of CONACYT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Mejia, I.R., Batyrshin, I. (2018). Towards a Classification of Binary Similarity Measures. In: Castro, F., Miranda-Jiménez, S., González-Mendoza, M. (eds) Advances in Soft Computing. MICAI 2017. Lecture Notes in Computer Science(), vol 10632. Springer, Cham. https://doi.org/10.1007/978-3-030-02837-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-02837-4_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02836-7
Online ISBN: 978-3-030-02837-4
eBook Packages: Computer ScienceComputer Science (R0)