Skip to main content

New Presence-Dependent Binary Similarity Measures for Pairwise Label Comparisons in Multi-label Classification

  • Conference paper
  • First Online:
Computational Collective Intelligence (ICCCI 2024)

Abstract

Similarity measures play an important role in data analysis, as the performance of different classification or clustering techniques relies on choosing an appropriate measure. Most formulas either increase values with “negative matches” or depreciate values by “absence mismatches”. As a result, to verify label dependencies in many data mining problems, e.g., multi-label classification, researchers usually must accept a compromise solution. In the paper, two effective formulas for binary vector comparisons are proposed. Both coefficients exclude “negative matches” but include “absence mismatches”. The formulas’ performance was examined and compared to other well-known binary similarity measures, including Jaccard and Dice or the Chi-square test for proportions. The experiments were performed on eight datasets of different characteristics regarding the multi-label classification problem and label dependencies. The investigations presented good efficiency of both new metrics in finding similarities between binary vectors and outperformed other examined metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bellet, A., Habrard, A., Sebban, M.: A survey on metric learning for feature vectors and structured data. arXiv:1306.6709 (2014)

  2. Mangolin, R.B., et al.: A multimodal approach for multi-label movie genre classification. Multimedia Tools Appl. 81(14), 19071–19096 (2022)

    Article  Google Scholar 

  3. Liu, X., et al.: Emotion classification for short texts: an improved multi-label method. Humanit. Soc. Sci. Commun. 10(1), 1–9 (2023)

    Google Scholar 

  4. Wosiak, A., Glinka, K., Zakrzewska, D.: Multi-label classification methods for improving comorbidities identification. Comput. Biol. Med. 100, 279–288 (2018)

    Article  Google Scholar 

  5. Zhou, L., Zheng, X., Yang, D., Wang, Y., Bai, X., Ye, X.: Application of multi-label classification models for the diagnosis of diabetic complications. BMC Med. Inform. Decis. Mak. 21(1), 182 (2021)

    Article  Google Scholar 

  6. Liu, W., Wang, H., Shen, X., Tsang, I.W.: The emerging trends of multi-label learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7955–7974 (2021)

    Article  Google Scholar 

  7. Choi, S.S., Cha, S.H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybernet. Inform. 8(1), 43–48 (2010)

    Google Scholar 

  8. Haq, I.U., Caballero, J.: A survey of binary code similarity. ACM Comput. Surv. (CSUR) 54(3), 1–38 (2021)

    Article  Google Scholar 

  9. Choi, S.-S., Cha, S.-H., Tappert, C.C.: A survey of binary similarity and distance measures. J. Syst. Cybernet. Inform. 8(1), 43–48 (2010)

    Google Scholar 

  10. Glinka, K., Wosiak, A., Zakrzewska, D.: Exploiting label interdependencies in multi-label classification. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds.) CORES 2019. AISC, vol. 977, pp. 57–66. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-19738-4_7

    Chapter  Google Scholar 

  11. McHugh, M.L.: The chi-square test of independence. Biochemia Medica 23(2), 143–149 (2013). https://doi.org/10.11613/BM.2013.018

  12. Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19(3), 361–394 (2009)

    Article  Google Scholar 

  13. Sokal, R.R., Sneath, P.H.: Principles of Numerical Taxonomy. W.H Freeman and Company, San Francisco (1963)

    Google Scholar 

  14. Mulan: A Java Library for Multi-Label Learning. http://mulan.sourceforge.net/

  15. Briggs, F., Huang, Y., Raich, R., Eftaxias, K., Lei, Z., et al.: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment. In: IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–8 (2013)

    Google Scholar 

  16. Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), vol. 21, pp. 53–59, September 2008

    Google Scholar 

  17. Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml

  18. Goncalves, E.C., Plastino, A., Freitas, A.A.: A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: 25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 469–476 (2013)

    Google Scholar 

  19. Boutell, M.R., Luo, J., Shen, X. Brown, Ch.M.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)

    Google Scholar 

  20. Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein classification with multiple algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005). https://doi.org/10.1007/11573036_42

    Chapter  Google Scholar 

  21. Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14 (2001)

    Google Scholar 

  22. Pestian, J., et al.: A shared task involving multi-label classification of clinical free text. In: Biological, Translational, and Clinical Language Processing, pp. 97–104, June 2007

    Google Scholar 

  23. Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 995–1000. IEEE, December 2008

    Google Scholar 

  24. Hinkle, D.E., Wiersma, W., Jurs, S.G.: Applied statistics for the behavioral sciences, 663. Houghton Mifflin College Division (2003)

    Google Scholar 

  25. Lanchantin, J., Sekhon, A., Qi, Y.: Neural message passing for multi-label classification. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 138–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_9

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agnieszka Wosiak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wosiak, A., Woźniak, R. (2024). New Presence-Dependent Binary Similarity Measures for Pairwise Label Comparisons in Multi-label Classification. In: Nguyen, N.T., et al. Computational Collective Intelligence. ICCCI 2024. Lecture Notes in Computer Science(), vol 14811. Springer, Cham. https://doi.org/10.1007/978-3-031-70819-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70819-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70818-3

  • Online ISBN: 978-3-031-70819-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics