MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

Liu, Feng; Zhang, Xiaofeng; Ye, Yunming; Zhao, Yahong; Li, Yan

doi:10.1007/978-3-319-22053-6_44

MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

Feng Liu⁶,
Xiaofeng Zhang⁶,
Yunming Ye⁶,
Yahong Zhao⁶ &
…
Yan Li⁷

Conference paper
First Online: 01 January 2015

3581 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9227))

Abstract

Although random forest is one of the best ensemble learning algorithms for single-label classification, exploiting it for multi-label classification problems is still challenging and few method has been investigated in the literature. This paper proposes MLRF, a multi-label classification method based on a variation of random forest. In this algorithm, a new label set partition method is proposed to transform multi-label data sets into multiple single-label data sets, which can effectively discover correlated labels to optimize the label subset partition. For each generated single-label subset, a random forest classifier is learned by an improved random forest algorithm that employs a kNN-like on-line instance sampling method. Experimental results on ten benchmark data sets have demonstrated that MLRF outperforms other state-of-the-art multi-label classification algorithms in terms of classification performance as well as various evaluation criteria widely used for multi-label classification.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Tsoumakas G, Katakis I. Multi-label classification: an overview. Department of Informatics, Aristotle University of Thessaloniki, Greece (2006)
Google Scholar
Madjarov, G., Kocev, D., Gjorgjevikj, D., et al.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012)
Article Google Scholar
Read, J., Pfahringer, B., Holmes, G.: Multi-label classification using ensembles of pruned sets. In: Eighth IEEE International Conference on Data Mining, 2008, ICDM 2008, pp. 995–1000. IEEE (2008)
Google Scholar
Tsoumakas, G., Vlahavas, I.P.: Random k-labelsets: an ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)
Chapter Google Scholar
Montañes, E., Senge, R., Barranquero, J., et al.: Dependent binary relevance models for multi-label classification. Pattern Recogn. 47(3), 1494–1508 (2014)
Article Google Scholar
Read, J., Pfahringer, B., Holmes, G., et al.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Article MathSciNet Google Scholar
Kocev, D.: Ensembles for predicting structured outputs. Informatica Int. J. Comput. Inf. 36(1), 113–114 (2012)
Google Scholar
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. arXiv preprint cs/0011032 (2000)
Google Scholar
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Ensembles of multi-objective decision trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 624–631. Springer, Heidelberg (2007)
Chapter Google Scholar
Zhang, M.L., Zhou, Z.H.: A k-nearest neighbor based algorithm for multi-label classification. In: IEEE International Conference on Granular Computing, 2005, vol. 2, pp. 718–721. IEEE (2005)
Google Scholar
Spyromitros, E., Tsoumakas, G., Vlahavas, I.P.: An empirical study of lazy multilabel classification algorithms. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds.) SETN 2008. LNCS (LNAI), vol. 5138, pp. 401–406. Springer, Heidelberg (2008)
Chapter Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multi-label classification in domains with large number of labels. In: Proceedings of ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD 2008), pp. 30–44 (2008)
Google Scholar
Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. Mach. Learn. 39(2–3), 135–168 (2000)
Article MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Ye, Y., Wu, Q., Huang, J.Z., et al.: Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46(3), 769–787 (2013)
Article Google Scholar
Wuensch, K.L.: Chi-square tests. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 252–253. Springer, Berlin, Heidelberg (2011)
Chapter Google Scholar
Sprent, P.: Fisher exact test. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 524–525. Springer, Berlin, Heidelberg (2011)
Chapter Google Scholar
Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. John Wiley & Sons (2013)
Google Scholar
Anisimova, M., Gascuel, O.: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst. Biol. 55(4), 539–552 (2006)
Article Google Scholar
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)
Article MathSciNet Google Scholar

Download references

Acknowledgement

Yunming Ye’s work was supported in part by National Key Technology R&D Program of MOST China under Grant No. 2014BAL05B06, Shenzhen Science and Technology Program under Grant No. JCYJ20140417172417128, and the Shenzhen Strategic Emerging Industries Program under Grant No. JCYJ20130329142551746. Yan Li’s work was supported in part by NSFC under Grant No. 61303103, and the Shenzhen Science and Technology Program under Grant No. JCY20130331150354073.

Author information

Authors and Affiliations

Shenzhen Key Laboratory of Internet Information Collaboration, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Feng Liu, Xiaofeng Zhang, Yunming Ye & Yahong Zhao
Shenzhen Polytechnic, Shenzhen, China
Yan Li

Authors

Feng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yunming Ye
View author publications
You can also search for this author in PubMed Google Scholar
Yahong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunming Ye .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Incheon, Korea (Republic of)
Kyungsook Han

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, F., Zhang, X., Ye, Y., Zhao, Y., Li, Y. (2015). MLRF: Multi-label Classification Through Random Forest with Label-Set Partition. In: Huang, DS., Han, K. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2015. Lecture Notes in Computer Science(), vol 9227. Springer, Cham. https://doi.org/10.1007/978-3-319-22053-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-22053-6_44
Published: 13 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22052-9
Online ISBN: 978-3-319-22053-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics