New Presence-Dependent Binary Similarity Measures for Pairwise Label Comparisons in Multi-label Classification

Wosiak, Agnieszka; Woźniak, Rafał

doi:10.1007/978-3-031-70819-0_21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14811))

Included in the following conference series:

International Conference on Computational Collective Intelligence

Abstract

Similarity measures play an important role in data analysis, as the performance of different classification or clustering techniques relies on choosing an appropriate measure. Most formulas either increase values with “negative matches” or depreciate values by “absence mismatches”. As a result, to verify label dependencies in many data mining problems, e.g., multi-label classification, researchers usually must accept a compromise solution. In the paper, two effective formulas for binary vector comparisons are proposed. Both coefficients exclude “negative matches” but include “absence mismatches”. The formulas’ performance was examined and compared to other well-known binary similarity measures, including Jaccard and Dice or the Chi-square test for proportions. The experiments were performed on eight datasets of different characteristics regarding the multi-label classification problem and label dependencies. The investigations presented good efficiency of both new metrics in finding similarities between binary vectors and outperformed other examined metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploiting Label Interdependencies in Multi-label Classification

Effective Multi-label Classification Method for Multidimensional Datasets

Establishing Interconnections of Similarity-Based Classifiers for Multi-label Learning with Missing Labels

References

Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. arXiv:1306.6709 (2014)
Mangolin, R.B., et al.: A multimodal approach for multi-label movie genre classification. Multimedia Tools Appl. 81(14), 19071–19096 (2022)
Article Google Scholar
Liu, X., et al.: Emotion classification for short texts: an improved multi-label method. Humanit. Soc. Sci. Commun. 10(1), 1–9 (2023)
Google Scholar
Wosiak, A., Glinka, K., Zakrzewska, D.: Multi-label classification methods for improving comorbidities identification. Comput. Biol. Med. 100, 279–288 (2018)
Article Google Scholar
Zhou, L., Zheng, X., Yang, D., Wang, Y., Bai, X., Ye, X.: Application of multi-label classification models for the diagnosis of diabetic complications. BMC Med. Inform. Decis. Mak. 21(1), 182 (2021)
Article Google Scholar
Liu, W., Wang, H., Shen, X., Tsang, I.W.: The emerging trends of multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7955–7974 (2021)
Article Google Scholar
Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybernet. Inform. 8(1), 43–48 (2010)
Google Scholar
Haq, I.U., Caballero, J.: A survey of binary code similarity. ACM Comput. Surv. (CSUR) 54(3), 1–38 (2021)
Article Google Scholar
Choi, S.-S., Cha, S.-H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybernet. Inform. 8(1), 43–48 (2010)
Google Scholar
Glinka, K., Wosiak, A., Zakrzewska, D.: Exploiting label interdependencies in multi-label classification. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds.) CORES 2019. AISC, vol. 977, pp. 57–66. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-19738-4_7
Chapter Google Scholar
McHugh, M.L.: The chi-square test of independence. Biochemia Medica 23(2), 143–149 (2013). https://doi.org/10.11613/BM.2013.018
Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19(3), 361–394 (2009)
Article Google Scholar
Sokal, R.R., Sneath, P.H.: Principles of Numerical Taxonomy. W.H Freeman and Company, San Francisco (1963)
Google Scholar
Mulan: A Java Library for Multi-Label Learning. http://mulan.sourceforge.net/
Briggs, F., Huang, Y., Raich, R., Eftaxias, K., Lei, Z., et al.: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In: IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–8 (2013)
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), vol. 21, pp. 53–59, September 2008
Google Scholar
Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml
Goncalves, E.C., Plastino, A., Freitas, A.A.: A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: 25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 469–476 (2013)
Google Scholar
Boutell, M.R., Luo, J., Shen, X. Brown, Ch.M.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)
Google Scholar
Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein classification with multiple algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005). https://doi.org/10.1007/11573036_42
Chapter Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14 (2001)
Google Scholar
Pestian, J., et al.: A shared task involving multi-label classification of clinical free text. In: Biological, Translational, and Clinical Language Processing, pp. 97–104, June 2007
Google Scholar
Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 995–1000. IEEE, December 2008
Google Scholar
Hinkle, D.E., Wiersma, W., Jurs, S.G.: Applied statistics for the behavioral sciences, 663. Houghton Mifflin College Division (2003)
Google Scholar
Lanchantin, J., Sekhon, A., Qi, Y.: Neural message passing for multi-label classification. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 138–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_9
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Technology, Lodz University of Technology, Lodz, Poland
Agnieszka Wosiak & Rafał Woźniak

Authors

Agnieszka Wosiak
View author publications
You can also search for this author in PubMed Google Scholar
Rafał Woźniak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Agnieszka Wosiak .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
University of Leipzig, Leipzig, Germany
Bogdan Franczyk
University of Leipzig, Leipzig, Sachsen, Germany
André Ludwig
Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, The Netherlands
Jan Treur
University of Münster, Münster, Germany
Gottfried Vossen
Wrocław University of Science and Technology, Wrocław, Poland
Adrianna Kozierkiewicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wosiak, A., Woźniak, R. (2024). New Presence-Dependent Binary Similarity Measures for Pairwise Label Comparisons in Multi-label Classification. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2024. Lecture Notes in Computer Science(), vol 14811. Springer, Cham. https://doi.org/10.1007/978-3-031-70819-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-70819-0_21
Published: 31 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70818-3
Online ISBN: 978-3-031-70819-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

New Presence-Dependent Binary Similarity Measures for Pairwise Label Comparisons in Multi-label Classification