MapReduce-based adaptive random forest algorithm for multi-label classification

Wu, Qinghua; Wang, Haihui; Yan, Xuesong; Liu, Xiaobo

doi:10.1007/s00521-018-3900-8

MapReduce-based adaptive random forest algorithm for multi-label classification

Machine Learning - Applications & Techniques in Cyber Intelligence
Published: 26 November 2018

Volume 31, pages 8239–8252, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Qinghua Wu¹,
Haihui Wang¹,
Xuesong Yan^2,3 &
…
Xiaobo Liu⁴

556 Accesses
13 Citations
Explore all metrics

Abstract

Due to the complexity of data characteristics, multi-label learning in data mining has been proposed by scholars to solve the problem of information knowledge in the era of big data. In the era of big data, the complexity of the data structures makes it impossible for traditional single-label learning methods to meet the needs of technological development. Moreover, the importance of multi-label learning is gradually becoming evident. The random forest (RF) algorithm is regarded as one of the best classification algorithms. In this study, the traditional decision tree algorithm was improved, and the traditional RF method was converted into an adaptive RF (ARF) method for multi-label classification. By experiments, the effectiveness of the proposed method was verified. The RF method may not be able to classify massive data in a short time, but Hadoop, which was by Apache, is suitable for data-intensive tasks. On this basis, we modified the MapReduce programming mode to make it suitable for the proposed ARF method. This method was implemented on the cloud platform, and the time effectiveness of the parallel model was verified by experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RRF-BD: Ranger Random Forest Algorithm for Big Data Classification

MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

Extremely Randomized Forest with Hierarchy of Multi-label Classifiers

References

Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv 47(3):1–38
Article Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehouse Min 3(3):1–13
Article Google Scholar
Streich AP, Buhmann JM (2008) Classification of multi-labeled data: a generative approach. Mach Learn Knowl Discov Databases DBLP:390–405
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet MATH Google Scholar
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771
Article Google Scholar
Li X, Wang L, Sung E (2004) Multilabel SVM active learning for image classification. Int Conf Image Process 4(4):2207–2210
Google Scholar
Diplaris S, Tsoumakas G, Mitkas PA, Vlahavas IP (2005) Protein classification with multiple algorithms. In: Panhellenic conference on informatics, pp 448–456
Chapter Google Scholar
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330
Google Scholar
Tawiah CA, Sheng VS (2013) Empirical comparison of multi-label classification algorithms. In: Proceedings of the 27th national conference on artificial intelligence (AAAI), Bellevue, Washington, pp 1645–1646
Cherman EA, Monard MC, Metz J (2011) Multi-label problem transformation methods: a case study. Clei Electron J 14(1):4
Article Google Scholar
Tawiah CA, Sheng VS (2013) A study on multi-label classification. In: Industrial conference on data mining (ICDM), Springer, Berlin, pp 137–150
Chapter Google Scholar
Yan X, Wu Q, Sheng VS (2016) A double weighted Naive Bayes with niching cultural algorithm for multi-label classification. Int J Pattern Recognit Artif Intell 30(06):1650013
Article Google Scholar
Wu J, Zhao S, Sheng VS, Ye C, Zhao P, Cui Z (2017) Weak labeled active learning with conditional label dependence for multi-label image classification. IEEE Trans Multimed 19(6):1156–1169
Article Google Scholar
Wu Q, Liu H, Yan X (2016) Multi-label classification algorithm research based on swarm intelligence. Clust Comput 19(4):2075–2085
Article Google Scholar
Wu J, Guo A, Sheng VS, Zhao P, Cui Z (2018) An active learning approach for multi-label image classification with sample noise. Int J Pattern Recognit Artif Intell 32(3):1–23
Article MathSciNet Google Scholar
Ma J, Zhou H, Zhao J, Gao Y, Jiang J, Tian J (2015) Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans Geosci Remote Sens 53(12):6469–6481
Article Google Scholar
Zang H, Zhang T, Zhang Y (2015) Bifurcation analysis of a mathematical model for genetic regulatory network with time delays. Appl Math Comput 260:204–226
MathSciNet MATH Google Scholar
Zhou H, Ma J, Yang C, Sun S, Liu R, Zhao J (2016) Nonrigid feature matching for remote sensing images via probabilistic inference with global and local regularizations. IEEE Geosci Remote Sens Lett 13(3):374–378
Google Scholar
Xia P (2016) Haptics for product design and manufacturing simulation. IEEE Trans Haptics 9(3):358–375
Article MathSciNet Google Scholar
Lu T, Peng L, Zhang Y (2016) Edge feature based approach for object recognition. Pattern Recognit Image Anal 26(2):350–353
Article Google Scholar
Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text Categorization. Mach Learn 39:135–168
Article MATH Google Scholar
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. In: Advances in neural information processing systems, pp 681–687
De Comite F, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision trees from texts and data. In: International workshop on machine learning and data mining in pattern recognition. Springer, Berlin, pp 35–49
Chapter MATH Google Scholar
Zhu S, Ji X, Xu W, Gong Y (2005) Multi-labelled classification using maximum entropy method. In: International ACM SIGIR conference on research and development in information retrieval, pp 274–281
Zhang M, Zhou Z (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Article MATH Google Scholar
Zhang M, Zhou Z (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Article Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13
Article Google Scholar
De Carvalho AC, Freitas AA (2009) A tutorial on multi-label classification techniques. Found Comput Intell 5:177–195
Google Scholar
Liu F, Zhang X, Ye Y, Zhao Y, Li Y (2015) MLRF: multi-label classification through random forest with label-set partition. In: International conference on intelligent computing, pp 407–418
Google Scholar
Breiman Leo (2001) Random Forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Gall J, Lempitsky VS (2009) Class-specific Hough forests for object detection. In: Decision forests for computer vision and medical image analysis. Springer, London, pp 143–157
Google Scholar
Gall J, Yao A, Razavi N, Van Gool L, Lempitsky VS (2011) Hough Forests for object detection, tracking, and action recognition. IEEE Trans Pattern Anal Mach Intell 33(11):2188–2202
Article Google Scholar
Prinzie A, Den Poel DV (2008) Random forests for multiclass classification: random multinomial logit. Expert Syst Appl 34(3):1721–1732
Article Google Scholar
Chen XW, Liu M (2005) Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 21(24):4394–4400
Article Google Scholar
Pang H, Datta D, Zhao H (2009) Pathway analysis using random forests with bivariate node-split for survival outcomes. Bioinformatics 26(2):250–258
Article Google Scholar
Rio SD, Lopez V, Benitez JM, Herrera F (2014) On the use of MapReduce for imbalanced big data using random forest. Inf Sci 285:112–137
Article Google Scholar
Ben-Haim Y, Tom-Tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res 11:849–872
MathSciNet MATH Google Scholar
Yan X, Zhu Z, Wu Q (2018) Intelligent inversion method for pre-stack seismic big data based on MapReduce. Comput Geosci 110:81–89
Article Google Scholar
Yan X, Zhu Z, Hu C, Gong W, Wu Q (2018) Spark-based intelligent parameter inversion method for prestack seismic data. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3457-6
Article Google Scholar
Strobl C, Boulesteix A, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinf 9(1):307
Article Google Scholar
Breiman Leo (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655
Article MATH Google Scholar
Borthakur D (2007) The Hadoop distributed file system: architecture and design. Hadoop Proj Website 11(11):1–10
Google Scholar
White T (2015) Hadoop—the definitive guide 4e. Hadoop: the definitive guide. O’Reilly Media Inc, Newton
Google Scholar
Zikopoulos P, Eaton C (1989) Understanding big data: analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, New York City
Google Scholar
Zhenhai Z, Shining L, Zhigang L, Hao C (2013) Multi-label feature selection algorithm based on information entropy. J Comput Res Dev 50(6):1177–1184
Google Scholar

Download references

Acknowledgements

This paper is supported by Natural Science Foundation of China (No. 61673354), the Fundamental Research Funds for the Central Universities, China University of Geosciences (Wuhan), the State Key Lab of Digital Manufacturing Equipment & Technology (DMETKF2018020), and Huazhong University of Science & Technology.

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan, 430205, Hubei, China
Qinghua Wu & Haihui Wang
School of Computer Science, China University of Geosciences, Wuhan, 430074, Hubei, China
Xuesong Yan
State Key Lab of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
Xuesong Yan
School of Automation, China University of Geosciences, Wuhan, 430074, Hubei, China
Xiaobo Liu

Authors

Qinghua Wu
View author publications
You can also search for this author in PubMed Google Scholar
Haihui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Yan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobo Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuesong Yan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Q., Wang, H., Yan, X. et al. MapReduce-based adaptive random forest algorithm for multi-label classification. Neural Comput & Applic 31, 8239–8252 (2019). https://doi.org/10.1007/s00521-018-3900-8

Download citation

Received: 16 August 2018
Accepted: 16 November 2018
Published: 26 November 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00521-018-3900-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MapReduce-based adaptive random forest algorithm for multi-label classification

Abstract

Access this article

Similar content being viewed by others

RRF-BD: Ranger Random Forest Algorithm for Big Data Classification

MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

Extremely Randomized Forest with Hierarchy of Multi-label Classifiers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MapReduce-based adaptive random forest algorithm for multi-label classification

Abstract

Access this article

Similar content being viewed by others

RRF-BD: Ranger Random Forest Algorithm for Big Data Classification

MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

Extremely Randomized Forest with Hierarchy of Multi-label Classifiers

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation