ABSTRACT
Multi-label classification is a variant of the classification problem where multiple labels may be assigned to each instance. Usually multi-label classification algorithms output a numerical score for each label, indicative of their relevance to a query instance. However, in many applications the desired output is a bipartition of the labels into relevant and irrelevant w.r.t the query instance. Bipartitions can be obtained from scores using various thresholding strategies, such as PCut strategy which selects relevant instances per label, and RCut strategy which selects relevant labels per instance. However, we suggest that a combination of both strategies would provide better classification performance. In this paper, we propose a fuzzy-based approach to combine PCut and RCut strategies, by converting the crisp relevance into fuzzy one, merging them linearly, and defuzzifying again. Our experiments shows that our hybrid approach indeed outperforms both strategies.
- Ilias Chalkidis, Emmanouil Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A Case Study in EU Legislation. In Proceedings of the Natural Legal Language Processing Workshop 2019. Association for Computational Linguistics, Minneapolis, Minnesota, 78–87. https://doi.org/10.18653/v1/W19-2209Google ScholarCross Ref
- André Elisseeff and Jason Weston. 2001. A Kernel Method for Multi-labelled Classification. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic(NIPS’01). MIT Press, Cambridge, MA, USA, 681–687. http://dl.acm.org/citation.cfm?id=2980539.2980628 Google ScholarDigital Library
- Nadia Ghamrawi and Andrew McCallum. 2005. Collective Multi-Label Classification. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management(CIKM ’05). Association for Computing Machinery, New York, NY, USA, 195–200. https://doi.org/10.1145/1099554.1099591 Google ScholarDigital Library
- Raji Ghawi and Jürgen Pfeffer. 2019. Movie Genres Classification Using Collaborative Filtering. In Proceedings of the 21st International Conference on Information Integration and Web-Based Applications & Services(iiWAS2019). Association for Computing Machinery, New York, NY, USA, 35–44. https://doi.org/10.1145/3366030.3366034 Google ScholarDigital Library
- Mario Piattini, José Galindo, and Angélica Urrutia. 2006. Fuzzy Databases: Modeling, Design and Implementation. Idea Group Publishing. https://doi.org/10.4018/978-1-59140-324-1Google Scholar
- Jesse Read, Bernhard Pfahringer, and Geoff Holmes. 2008. Multi-label Classification Using Ensembles of Pruned Sets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining(ICDM ’08). IEEE Computer Society, Washington, DC, USA, 995–1000. https://doi.org/10.1109/ICDM.2008.74 Google ScholarDigital Library
- Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2009. Classifier Chains for Multi-label Classification. In Proceedings of the 2009th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II(ECMLPKDD’09). Springer-Verlag, Berlin, Heidelberg, 254–269. https://doi.org/10.1007/978-3-642-04174-7_17Google ScholarCross Ref
- Mohammad S. Sorower. 2010. A Literature Survey on Algorithms for Multi-label Learning. Technical Report. Oregon State University.Google Scholar
- Eleftherios Spyromitros, Grigorios Tsoumakas, and Ioannis Vlahavas. 2008. An Empirical Study of Lazy Multilabel Classification Algorithms. In Artificial Intelligence: Theories, Models and Applications, John Darzentas, George A. Vouros, Spyros Vosinakis, and Argyris Arnellos (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 401–406. Google ScholarDigital Library
- Lei Tang, Suju Rajan, and Vijay K. Narayanan. 2009. Large Scale Multi-label Classification via Metalabeler. In Proceedings of the 18th International Conference on World Wide Web(WWW ’09). ACM, New York, NY, USA, 211–220. https://doi.org/10.1145/1526709.1526738 Google ScholarDigital Library
- Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. 2004. Support Vector Machine Learning for Interdependent and Structured Output Spaces. In Proceedings of the Twenty-First International Conference on Machine Learning(ICML ’04). Association for Computing Machinery, New York, NY, USA, 104. https://doi.org/10.1145/1015330.1015341 Google ScholarDigital Library
- Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2011. Random k-Labelsets for Multilabel Classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (July 2011), 1079–1089. https://doi.org/10.1109/TKDE.2010.164 Google ScholarDigital Library
- Grigorios Tsoumakas and Ioannis Vlahavas. 2007. Random k-Labelsets: An Ensemble Method for Multilabel Classification. In Machine Learning: ECML 2007, Joost N. Kok, Jacek Koronacki, Raomon Lopez de Mantaras, Stan Matwin, Dunja Mladenič, and Andrzej Skowron (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 406–417. Google ScholarDigital Library
- Naonori Ueda and Kazumi Saito. 2002. Parametric Mixture Models for Multi-Labeled Text. In Proceedings of the 15th International Conference on Neural Information Processing Systems(NIPS’02). MIT Press, Cambridge, MA, USA, 737–744. Google ScholarDigital Library
- Rong Yan, Jelena Tesic, and John R. Smith. 2007. Model-Shared Subspace Boosting for Multi-Label Classification. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’07). Association for Computing Machinery, New York, NY, USA, 834–843. https://doi.org/10.1145/1281192.1281281 Google ScholarDigital Library
- Yiming Yang. 2001. A Study of Thresholding Strategies for Text Categorization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’01). ACM, New York, NY, USA, 137–145. https://doi.org/10.1145/383952.383975 Google ScholarDigital Library
- Yiming Yang and Xin Liu. 1999. A Re-examination of Text Categorization Methods. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 42–49. Google ScholarDigital Library
- Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A Lazy Learning Approach to Multi-Label Learning. Pattern Recogn. 40, 7 (July 2007), 2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019 Google ScholarDigital Library
- Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review On Multi-Label Learning Algorithms. Knowledge and Data Engineering, IEEE Transactions on 26 (08 2014), 1819–1837. https://doi.org/10.1109/TKDE.2013.39Google Scholar
- Shenghuo Zhu, Xiang Ji, Wei Xu, and Yihong Gong. 2005. Multi-Labelled Classification Using Maximum Entropy Method. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’05). Association for Computing Machinery, New York, NY, USA, 274–281. https://doi.org/10.1145/1076034.1076082 Google ScholarDigital Library
Index Terms
- A Hybrid Thresholding Strategy combining RCut and PCut for Multi-label Classification
Recommendations
Multilabel classifiers with a probabilistic thresholding strategy
In multilabel classification tasks the aim is to find hypotheses able to predict, for each instance, a set of classes or labels rather than a single one. Some state-of-the-art multilabel learners use a thresholding strategy, which consists in computing ...
A simple approach to incorporate label dependency in multi-label classification
MICAI'10: Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part IIIn multi-label classification, each example can be associated with multiple labels simultaneously. The task of learning from multilabel data can be addressed by methods that transform the multi-label classification problem into several single-label ...
Semi-supervised multi-label classification using incomplete label information
Highlights- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
AbstractClassifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Comments