skip to main content
10.1145/3487664.3487702acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

A Hybrid Thresholding Strategy combining RCut and PCut for Multi-label Classification

Published:30 December 2021Publication History

ABSTRACT

Multi-label classification is a variant of the classification problem where multiple labels may be assigned to each instance. Usually multi-label classification algorithms output a numerical score for each label, indicative of their relevance to a query instance. However, in many applications the desired output is a bipartition of the labels into relevant and irrelevant w.r.t the query instance. Bipartitions can be obtained from scores using various thresholding strategies, such as PCut strategy which selects relevant instances per label, and RCut strategy which selects relevant labels per instance. However, we suggest that a combination of both strategies would provide better classification performance. In this paper, we propose a fuzzy-based approach to combine PCut and RCut strategies, by converting the crisp relevance into fuzzy one, merging them linearly, and defuzzifying again. Our experiments shows that our hybrid approach indeed outperforms both strategies.

References

  1. Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2019. Association for Computational Linguistics, Minneapolis, Minnesota, 78–87. https://doi.org/10.18653/v1/W19-2209Google ScholarGoogle ScholarCross RefCross Ref
  2. André Elisseeff and Jason Weston. 2001. A Kernel Method for Multi-labelled Classification. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic(NIPS’01). MIT Press, Cambridge, MA, USA, 681–687. http://dl.acm.org/citation.cfm?id=2980539.2980628 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nadia Ghamrawi and Andrew McCallum. 2005. Collective Multi-Label Classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management(CIKM ’05). Association for Computing Machinery, New York, NY, USA, 195–200. https://doi.org/10.1145/1099554.1099591 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Raji Ghawi and Jürgen Pfeffer. 2019. Movie Genres Classification Using Collaborative Filtering. In Proceedings of the 21st International Conference on Information Integration and Web-Based Applications & Services(iiWAS2019). Association for Computing Machinery, New York, NY, USA, 35–44. https://doi.org/10.1145/3366030.3366034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mario Piattini, José Galindo, and Angélica Urrutia. 2006. Fuzzy Databases: Modeling, Design and Implementation. Idea Group Publishing. https://doi.org/10.4018/978-1-59140-324-1Google ScholarGoogle Scholar
  6. Jesse Read, Bernhard Pfahringer, and Geoff Holmes. 2008. Multi-label Classification Using Ensembles of Pruned Sets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining(ICDM ’08). IEEE Computer Society, Washington, DC, USA, 995–1000. https://doi.org/10.1109/ICDM.2008.74 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2009. Classifier Chains for Multi-label Classification. In Proceedings of the 2009th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II(ECMLPKDD’09). Springer-Verlag, Berlin, Heidelberg, 254–269. https://doi.org/10.1007/978-3-642-04174-7_17Google ScholarGoogle ScholarCross RefCross Ref
  8. Mohammad S. Sorower. 2010. A Literature Survey on Algorithms for Multi-label Learning. Technical Report. Oregon State University.Google ScholarGoogle Scholar
  9. Eleftherios Spyromitros, Grigorios Tsoumakas, and Ioannis Vlahavas. 2008. An Empirical Study of Lazy Multilabel Classification Algorithms. In Artificial Intelligence: Theories, Models and Applications, John Darzentas, George A. Vouros, Spyros Vosinakis, and Argyris Arnellos (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 401–406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lei Tang, Suju Rajan, and Vijay K. Narayanan. 2009. Large Scale Multi-label Classification via Metalabeler. In Proceedings of the 18th International Conference on World Wide Web(WWW ’09). ACM, New York, NY, USA, 211–220. https://doi.org/10.1145/1526709.1526738 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. 2004. Support Vector Machine Learning for Interdependent and Structured Output Spaces. In Proceedings of the Twenty-First International Conference on Machine Learning(ICML ’04). Association for Computing Machinery, New York, NY, USA, 104. https://doi.org/10.1145/1015330.1015341 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2011. Random k-Labelsets for Multilabel Classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (July 2011), 1079–1089. https://doi.org/10.1109/TKDE.2010.164 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Grigorios Tsoumakas and Ioannis Vlahavas. 2007. Random k-Labelsets: An Ensemble Method for Multilabel Classification. In Machine Learning: ECML 2007, Joost N. Kok, Jacek Koronacki, Raomon Lopez de Mantaras, Stan Matwin, Dunja Mladenič, and Andrzej Skowron (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 406–417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Naonori Ueda and Kazumi Saito. 2002. Parametric Mixture Models for Multi-Labeled Text. In Proceedings of the 15th International Conference on Neural Information Processing Systems(NIPS’02). MIT Press, Cambridge, MA, USA, 737–744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rong Yan, Jelena Tesic, and John R. Smith. 2007. Model-Shared Subspace Boosting for Multi-Label Classification. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’07). Association for Computing Machinery, New York, NY, USA, 834–843. https://doi.org/10.1145/1281192.1281281 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yiming Yang. 2001. A Study of Thresholding Strategies for Text Categorization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’01). ACM, New York, NY, USA, 137–145. https://doi.org/10.1145/383952.383975 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yiming Yang and Xin Liu. 1999. A Re-examination of Text Categorization Methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 42–49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A Lazy Learning Approach to Multi-Label Learning. Pattern Recogn. 40, 7 (July 2007), 2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review On Multi-Label Learning Algorithms. Knowledge and Data Engineering, IEEE Transactions on 26 (08 2014), 1819–1837. https://doi.org/10.1109/TKDE.2013.39Google ScholarGoogle Scholar
  20. Shenghuo Zhu, Xiang Ji, Wei Xu, and Yihong Gong. 2005. Multi-Labelled Classification Using Maximum Entropy Method. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’05). Association for Computing Machinery, New York, NY, USA, 274–281. https://doi.org/10.1145/1076034.1076082 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Hybrid Thresholding Strategy combining RCut and PCut for Multi-label Classification
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence
            November 2021
            658 pages

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 December 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)6
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format