Abstract
Although random forest is one of the best ensemble learning algorithms for single-label classification, exploiting it for multi-label classification problems is still challenging and few method has been investigated in the literature. This paper proposes MLRF, a multi-label classification method based on a variation of random forest. In this algorithm, a new label set partition method is proposed to transform multi-label data sets into multiple single-label data sets, which can effectively discover correlated labels to optimize the label subset partition. For each generated single-label subset, a random forest classifier is learned by an improved random forest algorithm that employs a kNN-like on-line instance sampling method. Experimental results on ten benchmark data sets have demonstrated that MLRF outperforms other state-of-the-art multi-label classification algorithms in terms of classification performance as well as various evaluation criteria widely used for multi-label classification.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Tsoumakas G, Katakis I. Multi-label classification: an overview. Department of Informatics, Aristotle University of Thessaloniki, Greece (2006)
Madjarov, G., Kocev, D., Gjorgjevikj, D., et al.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012)
Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: Eighth IEEE International Conference on Data Mining, 2008, ICDM 2008, pp. 995–1000. IEEE (2008)
Tsoumakas, G., Vlahavas, I.P.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)
Montañes, E., Senge, R., Barranquero, J., et al.: Dependent binary relevance models for multi-label classification. Pattern Recogn. 47(3), 1494–1508 (2014)
Read, J., Pfahringer, B., Holmes, G., et al.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Kocev, D.: Ensembles for predicting structured outputs. Informatica Int. J. Comput. Inf. 36(1), 113–114 (2012)
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. arXiv preprint cs/0011032 (2000)
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Ensembles of multi-objective decision trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 624–631. Springer, Heidelberg (2007)
Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: IEEE International Conference on Granular Computing, 2005, vol. 2, pp. 718–721. IEEE (2005)
Spyromitros, E., Tsoumakas, G., Vlahavas, I.P.: An empirical study of lazy multilabel classification algorithms. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 401–406. Springer, Heidelberg (2008)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multi-label classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), pp. 30–44 (2008)
Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Ye, Y., Wu, Q., Huang, J.Z., et al.: Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46(3), 769–787 (2013)
Wuensch, K.L.: Chi-square tests. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 252–253. Springer, Berlin, Heidelberg (2011)
Sprent, P.: Fisher exact test. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 524–525. Springer, Berlin, Heidelberg (2011)
Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. John Wiley & Sons (2013)
Anisimova, M., Gascuel, O.: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006)
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)
Acknowledgement
Yunming Ye’s work was supported in part by National Key Technology R&D Program of MOST China under Grant No. 2014BAL05B06, Shenzhen Science and Technology Program under Grant No. JCYJ20140417172417128, and the Shenzhen Strategic Emerging Industries Program under Grant No. JCYJ20130329142551746. Yan Li’s work was supported in part by NSFC under Grant No. 61303103, and the Shenzhen Science and Technology Program under Grant No. JCY20130331150354073.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, F., Zhang, X., Ye, Y., Zhao, Y., Li, Y. (2015). MLRF: Multi-label Classification Through Random Forest with Label-Set Partition. In: Huang, DS., Han, K. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2015. Lecture Notes in Computer Science(), vol 9227. Springer, Cham. https://doi.org/10.1007/978-3-319-22053-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-22053-6_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22052-9
Online ISBN: 978-3-319-22053-6
eBook Packages: Computer ScienceComputer Science (R0)