Abstract
Similarity measures play an important role in data analysis, as the performance of different classification or clustering techniques relies on choosing an appropriate measure. Most formulas either increase values with “negative matches” or depreciate values by “absence mismatches”. As a result, to verify label dependencies in many data mining problems, e.g., multi-label classification, researchers usually must accept a compromise solution. In the paper, two effective formulas for binary vector comparisons are proposed. Both coefficients exclude “negative matches” but include “absence mismatches”. The formulas’ performance was examined and compared to other well-known binary similarity measures, including Jaccard and Dice or the Chi-square test for proportions. The experiments were performed on eight datasets of different characteristics regarding the multi-label classification problem and label dependencies. The investigations presented good efficiency of both new metrics in finding similarities between binary vectors and outperformed other examined metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. arXiv:1306.6709 (2014)
Mangolin, R.B., et al.: A multimodal approach for multi-label movie genre classification. Multimedia Tools Appl. 81(14), 19071–19096 (2022)
Liu, X., et al.: Emotion classification for short texts: an improved multi-label method. Humanit. Soc. Sci. Commun. 10(1), 1–9 (2023)
Wosiak, A., Glinka, K., Zakrzewska, D.: Multi-label classification methods for improving comorbidities identification. Comput. Biol. Med. 100, 279–288 (2018)
Zhou, L., Zheng, X., Yang, D., Wang, Y., Bai, X., Ye, X.: Application of multi-label classification models for the diagnosis of diabetic complications. BMC Med. Inform. Decis. Mak. 21(1), 182 (2021)
Liu, W., Wang, H., Shen, X., Tsang, I.W.: The emerging trends of multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7955–7974 (2021)
Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybernet. Inform. 8(1), 43–48 (2010)
Haq, I.U., Caballero, J.: A survey of binary code similarity. ACM Comput. Surv. (CSUR) 54(3), 1–38 (2021)
Choi, S.-S., Cha, S.-H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybernet. Inform. 8(1), 43–48 (2010)
Glinka, K., Wosiak, A., Zakrzewska, D.: Exploiting label interdependencies in multi-label classification. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds.) CORES 2019. AISC, vol. 977, pp. 57–66. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-19738-4_7
McHugh, M.L.: The chi-square test of independence. Biochemia Medica 23(2), 143–149 (2013). https://doi.org/10.11613/BM.2013.018
Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19(3), 361–394 (2009)
Sokal, R.R., Sneath, P.H.: Principles of Numerical Taxonomy. W.H Freeman and Company, San Francisco (1963)
Mulan: A Java Library for Multi-Label Learning. http://mulan.sourceforge.net/
Briggs, F., Huang, Y., Raich, R., Eftaxias, K., Lei, Z., et al.: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In: IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–8 (2013)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), vol. 21, pp. 53–59, September 2008
Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml
Goncalves, E.C., Plastino, A., Freitas, A.A.: A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: 25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 469–476 (2013)
Boutell, M.R., Luo, J., Shen, X. Brown, Ch.M.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)
Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein classification with multiple algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005). https://doi.org/10.1007/11573036_42
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14 (2001)
Pestian, J., et al.: A shared task involving multi-label classification of clinical free text. In: Biological, Translational, and Clinical Language Processing, pp. 97–104, June 2007
Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 995–1000. IEEE, December 2008
Hinkle, D.E., Wiersma, W., Jurs, S.G.: Applied statistics for the behavioral sciences, 663. Houghton Mifflin College Division (2003)
Lanchantin, J., Sekhon, A., Qi, Y.: Neural message passing for multi-label classification. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 138–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wosiak, A., Woźniak, R. (2024). New Presence-Dependent Binary Similarity Measures for Pairwise Label Comparisons in Multi-label Classification. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2024. Lecture Notes in Computer Science(), vol 14811. Springer, Cham. https://doi.org/10.1007/978-3-031-70819-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-70819-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70818-3
Online ISBN: 978-3-031-70819-0
eBook Packages: Computer ScienceComputer Science (R0)