research-article

A Hybrid Thresholding Strategy combining RCut and PCut for Multi-label Classification

Authors:
Raji Ghawi

Technical University of Munich, Germany

Technical University of Munich, Germany
View Profile

,
Juergen Pfeffer

Technical University of Munich, Germany

Technical University of Munich, Germany
View Profile

iiWAS2021: The 23rd International Conference on Information Integration and Web IntelligenceNovember 2021Pages 278–287https://doi.org/10.1145/3487664.3487702

Published:30 December 2021Publication History

iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence

Pages 278–287

ABSTRACT

Multi-label classification is a variant of the classification problem where multiple labels may be assigned to each instance. Usually multi-label classification algorithms output a numerical score for each label, indicative of their relevance to a query instance. However, in many applications the desired output is a bipartition of the labels into relevant and irrelevant w.r.t the query instance. Bipartitions can be obtained from scores using various thresholding strategies, such as PCut strategy which selects relevant instances per label, and RCut strategy which selects relevant labels per instance. However, we suggest that a combination of both strategies would provide better classification performance. In this paper, we propose a fuzzy-based approach to combine PCut and RCut strategies, by converting the crisp relevance into fuzzy one, merging them linearly, and defuzzifying again. Our experiments shows that our hybrid approach indeed outperforms both strategies.

References

Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2019. Association for Computational Linguistics, Minneapolis, Minnesota, 78–87. https://doi.org/10.18653/v1/W19-2209Google ScholarCross Ref
André Elisseeff and Jason Weston. 2001. A Kernel Method for Multi-labelled Classification. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic(NIPS’01). MIT Press, Cambridge, MA, USA, 681–687. http://dl.acm.org/citation.cfm?id=2980539.2980628 Google ScholarDigital Library
Nadia Ghamrawi and Andrew McCallum. 2005. Collective Multi-Label Classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management(CIKM ’05). Association for Computing Machinery, New York, NY, USA, 195–200. https://doi.org/10.1145/1099554.1099591 Google ScholarDigital Library
Raji Ghawi and Jürgen Pfeffer. 2019. Movie Genres Classification Using Collaborative Filtering. In Proceedings of the 21st International Conference on Information Integration and Web-Based Applications & Services(iiWAS2019). Association for Computing Machinery, New York, NY, USA, 35–44. https://doi.org/10.1145/3366030.3366034 Google ScholarDigital Library
Mario Piattini, José Galindo, and Angélica Urrutia. 2006. Fuzzy Databases: Modeling, Design and Implementation. Idea Group Publishing. https://doi.org/10.4018/978-1-59140-324-1Google Scholar
Jesse Read, Bernhard Pfahringer, and Geoff Holmes. 2008. Multi-label Classification Using Ensembles of Pruned Sets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining(ICDM ’08). IEEE Computer Society, Washington, DC, USA, 995–1000. https://doi.org/10.1109/ICDM.2008.74 Google ScholarDigital Library
Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2009. Classifier Chains for Multi-label Classification. In Proceedings of the 2009th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II(ECMLPKDD’09). Springer-Verlag, Berlin, Heidelberg, 254–269. https://doi.org/10.1007/978-3-642-04174-7_17Google ScholarCross Ref
Mohammad S. Sorower. 2010. A Literature Survey on Algorithms for Multi-label Learning. Technical Report. Oregon State University.Google Scholar
Eleftherios Spyromitros, Grigorios Tsoumakas, and Ioannis Vlahavas. 2008. An Empirical Study of Lazy Multilabel Classification Algorithms. In Artificial Intelligence: Theories, Models and Applications, John Darzentas, George A. Vouros, Spyros Vosinakis, and Argyris Arnellos (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 401–406. Google ScholarDigital Library
Lei Tang, Suju Rajan, and Vijay K. Narayanan. 2009. Large Scale Multi-label Classification via Metalabeler. In Proceedings of the 18th International Conference on World Wide Web(WWW ’09). ACM, New York, NY, USA, 211–220. https://doi.org/10.1145/1526709.1526738 Google ScholarDigital Library
Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. 2004. Support Vector Machine Learning for Interdependent and Structured Output Spaces. In Proceedings of the Twenty-First International Conference on Machine Learning(ICML ’04). Association for Computing Machinery, New York, NY, USA, 104. https://doi.org/10.1145/1015330.1015341 Google ScholarDigital Library
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2011. Random k-Labelsets for Multilabel Classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (July 2011), 1079–1089. https://doi.org/10.1109/TKDE.2010.164 Google ScholarDigital Library
Grigorios Tsoumakas and Ioannis Vlahavas. 2007. Random k-Labelsets: An Ensemble Method for Multilabel Classification. In Machine Learning: ECML 2007, Joost N. Kok, Jacek Koronacki, Raomon Lopez de Mantaras, Stan Matwin, Dunja Mladenič, and Andrzej Skowron (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 406–417. Google ScholarDigital Library
Naonori Ueda and Kazumi Saito. 2002. Parametric Mixture Models for Multi-Labeled Text. In Proceedings of the 15th International Conference on Neural Information Processing Systems(NIPS’02). MIT Press, Cambridge, MA, USA, 737–744. Google ScholarDigital Library
Rong Yan, Jelena Tesic, and John R. Smith. 2007. Model-Shared Subspace Boosting for Multi-Label Classification. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’07). Association for Computing Machinery, New York, NY, USA, 834–843. https://doi.org/10.1145/1281192.1281281 Google ScholarDigital Library
Yiming Yang. 2001. A Study of Thresholding Strategies for Text Categorization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’01). ACM, New York, NY, USA, 137–145. https://doi.org/10.1145/383952.383975 Google ScholarDigital Library
Yiming Yang and Xin Liu. 1999. A Re-examination of Text Categorization Methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 42–49. Google ScholarDigital Library
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A Lazy Learning Approach to Multi-Label Learning. Pattern Recogn. 40, 7 (July 2007), 2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019 Google ScholarDigital Library
Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review On Multi-Label Learning Algorithms. Knowledge and Data Engineering, IEEE Transactions on 26 (08 2014), 1819–1837. https://doi.org/10.1109/TKDE.2013.39Google Scholar
Shenghuo Zhu, Xiang Ji, Wei Xu, and Yihong Gong. 2005. Multi-Labelled Classification Using Maximum Entropy Method. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’05). Association for Computing Machinery, New York, NY, USA, 274–281. https://doi.org/10.1145/1076034.1076082 Google ScholarDigital Library

Index Terms

A Hybrid Thresholding Strategy combining RCut and PCut for Multi-label Classification
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Probabilistic reasoning
      2. Vagueness and fuzzy logic
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

Multilabel classifiers with a probabilistic thresholding strategy

In multilabel classification tasks the aim is to find hypotheses able to predict, for each instance, a set of classes or labels rather than a single one. Some state-of-the-art multilabel learners use a thresholding strategy, which consists in computing ...
Read More
A simple approach to incorporate label dependency in multi-label classification
MICAI'10: Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II

In multi-label classification, each example can be associated with multiple labels simultaneously. The task of learning from multilabel data can be addressed by methods that transform the multi-label classification problem into several single-label ...
Read More
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence
November 2021
658 pages
ISBN:9781450395564
DOI:10.1145/3487664
Editors:
Eric Pardede,
Maria Indrawan-Santiago,
Pari Delir Haghighi,
Matthias Steinbauer,
Ismail Khalil,
Gabriele Kotsis
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 December 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fuzzy-logic
multilabel classification
thresholding strategy
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 23
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Hybrid Thresholding Strategy combining RCut and PCut for Multi-label Classification

iiWAS2021: The 23rd International Conference on Information Integration and Web Intelligence

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multilabel classifiers with a probabilistic thresholding strategy

A simple approach to incorporate label dependency in multi-label classification

Semi-supervised multi-label classification using incomplete label information