Skip to main content
Log in

Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Data used in particle physics analyses have an imbalanced nature in which the events of interest are rare due to the broad background. These events can be identified from bulk by intensive computational studies including application of sophisticated analysis techniques. Classification algorithms provided by supervised machine learning (ML) approaches can be utilized to interpret skewed particle dataset as an alternative to the classic techniques even for multi particle state analysis. In this study, the ground state of the bottomonium (\(\varUpsilon \)(1 S)) and its excited states (\(\varUpsilon \)(2 S) and \(\varUpsilon \)(3 S)) were studied by application of multiclass classification approach based on random forest classifier (RFC) which is a novel ML approach example in particle analysis with implementation of resampling techniques for preprocessing dataset and modification of the weighting strategy. For this purpose, five widely used oversampling and two hybrid strategies, using over and under resampling together, were adjusted to RFC. Moreover, class weights applied RFC, weighted random forest (WRF), was used in the analysis. Due to the data structure, performance of the applied models was evaluated by the derivatives of confusion matrix. It is revealed that hybrid techniques implemented in RFC is suitable for handling highly imbalanced classes. G-mean and BAcc scores of upsilon states presented that with SMOTETomek strategy the model exhibited highest classification achievement, around 90\(\%\), with high sensitivity implying the success of the application on multiclass classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Susan, S., Kumar, A.: The balancing trick: optimized sampling of imbalanced datasets—a brief survey of the recent state of the art. Eng Rep 3, 12298 (2020)

    Google Scholar 

  2. Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int J Emerg Technol Adv Eng 02, 42 (2012)

    Google Scholar 

  3. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. GESTS Int Trans Comput Sci Eng 30, 25 (2006)

    Google Scholar 

  4. Visa, S., Ralescu, A.: Issues in mining imbalanced data sets—a review paper. In: Proceedings of the midwest artificial intelligence and cognitive science conference (2005)

  5. Nguyen, G.H., Bouzerdoum, A., Phung, S.L.: Learning pattern classification tasks with imbalanced data sets. In: Pattern Recognition, Peng-Yeng Yin. ISBN 978-953-307-014-8 (2009)

  6. Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23, 687 (2009)

    Google Scholar 

  7. ALICE Collaboration: Measurements of the dielectron continuum in pp, p-Pb and Pb–Pb collisions with ALICE at the LHC. Nucl. Phys. A 967, 684 (2017)

  8. Alves, A.: Stacking machine learning classifiers to identify Higgs Bosons at the LHC. J. Instrum. 12, T05005 (2017)

    Google Scholar 

  9. Kuzu, S.Y.: J/\(\psi \) production with machine learning at the LHC. Eur. Phys. J. Plus 137, 392 (2022)

    Google Scholar 

  10. Breiman, L.: Random forests. Mach. Learn. 45, 5 (2001)

    MATH  Google Scholar 

  11. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220 (2017)

    Google Scholar 

  12. Chen, C., Liaw, A., Breiman, L.: Using Random Forest to Learn Imbalanced Data, vol. 666. University of California, Berkeley (2004)

    Google Scholar 

  13. Chawla, N.V.K., Bowyer, W., Hall, L.O., Kegelmeyer, W.P., Chawla, N.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321 (2002)

    MATH  Google Scholar 

  14. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, p. 1322 (2008)

  15. Han, H., Wang, W., Mao, B.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Adv. Intell. Comput. 3644, 878 (2005)

    Google Scholar 

  16. Nguyen, H.M., Coope, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 03, 4 (2009)

    Google Scholar 

  17. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 06, 2 (2004)

    Google Scholar 

  18. Karsch, F., Laermann, E.: Thermodynamics and in-medium hadron properties from lattice QCD. Quark-Gluon Plasma III, 1 (2004)

    MATH  Google Scholar 

  19. Shuryak, E.V.: Theory of hadronic plasma. Sov. Phys. JETP 47, 212 (1978)

    Google Scholar 

  20. ALICE Collaboration: Upsilon production in Pb–Pb and p–Pb collisions at forward rapidity with ALICE at the LHC. J. Phys. Conf. Ser. 509, 012112 (2014)

  21. Matsui, T., Satz, H.: J/\(\psi \) suppression by quark-gluon plasma formation. Phys. Lett. B 178, 416 (1986)

    Google Scholar 

  22. Digal, S., Petreczky, P., Satz, H.: Quarkonium feed down and sequential suppression. Phys. Rev. D 64, 094015 (2001)

    Google Scholar 

  23. Brambilla, N., Ghiglieri, J., Vairo, A., Petreczky, P.: Static quark-antiquark pairs at finite temperature. Phys. Rev. D 78, 014017 (2008)

    Google Scholar 

  24. Brambilla, N., Escobedo, M.A., Ghiglieri, J., Soto, J., Vairo, A.: Heavy quarkonium in a weakly-coupled quark-gluon plasma below the melting temperature. JHEP 09, 038 (2010)

    MATH  Google Scholar 

  25. CMS Collaboration: Measurement of nuclear modification factors of \(\Upsilon (1S)\), \(\Upsilon (2S)\), and \(\Upsilon (3S)\) mesons in PbPb collisions at \(\sqrt{s_{NN}}= 5.02\) TeV. Phys. Lett. B 790, 270 (2019)

  26. Collaboration, S.T.A.R.: Suppression of upsilon production in d+Au and Au+Au collisions at \(\sqrt{s_{NN}}=200\) GeV. Phys. Lett. B 735, 127 (2014)

    Google Scholar 

  27. Collaboration, C.M.S.: Observation of sequential \(\Upsilon \) suppression in Pb–Pb collisions. Phys. Rev. Lett. 109, 222301 (2012)

    Google Scholar 

  28. CMS Collaboration: Suppression of \(\Upsilon (1S)\), \(\Upsilon (2S)\), and \(\Upsilon (3S)\) production in PbPb collisions at \(\sqrt{s_{NN}}=2.76\) TeV. Phys. Lett. B 770, 357 (2017)

  29. Collaboration, C.L.E.O.: Dielectron widths of the \(\Upsilon (1S)\), \(\Upsilon (2S)\), and \(\Upsilon (3S)\) resonances. Phys. Rev. Lett. 96, 092003 (2006)

    Google Scholar 

  30. Collaboration, C.L.E.O.: Recent upsilonium results from CLEO III. AIP Conf. Proc. 870, 356 (2006)

    Google Scholar 

  31. STAR Collaboration: Upsilon production in U+U collisions at 193 GeV with the STAR experiment. Phys. Rev. C 94 (2016)

  32. ALICE Collaboration: \(\Upsilon \) production and nuclear modification at forward rapidity in Pb–Pb collisions at \(\sqrt{s_{NN}}=5.02\) TeV. Phys. Lett. B 822, 136579 (2021)

  33. Olive, K.A., et al.: Particle data group. Chin. Phys. C 38, 090001 (2014)

    Google Scholar 

  34. Tanabashi, M., et al.: Particle data group. Phys. Rev. D 98, 030001 (2018)

    Google Scholar 

  35. Nourbakhsh, S.: Studio degli eventi J/\(\Psi \) in due elettroni con i primi dati di CMS. Ph.D. Thesis, Sapienza University of Rome (2010)

  36. STAR Collaboration: \(\Upsilon \) measurement in STAR. Int. J. Mod. Phys. E 16 (2007)

  37. ALICE Collaboration: Differential studies of inclusive J/\(\psi \) and \(\Upsilon (2S)\) production at forward rapidity in Pb–Pb collisions at \(\sqrt{s_{NN}}=2.76\) TeV. JHEP 5, 179 (2016)

  38. ALICE Collaboration: \(\Upsilon \) suppression at forward rapidity in Pb–Pb collisions at \(\sqrt{s_{NN}}=5.02\) TeV. Phys. Lett. B 790, 89 (2019)

  39. Muller, A.C., Guido, S.: Introduction to Machine Learning with Python. O’Reilly Media Inc. ISBN 978-1-449-36941-5 (2016)

  40. Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. 17, 168–172 (2020)

    Google Scholar 

  41. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41, 15 (2009)

    Google Scholar 

  42. Krawczyk, B., Galar, M., Jelen, L., Herrera, F.: Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714 (2016)

    Google Scholar 

  43. Vuttipittayamongkol, P., Elyan, E.: Overlap-based undersampling method for classification of imbalanced medical datasets. In: IFIP International Conference on Artificial Intelligence Applications and Innovations, p. 358 (2020)

  44. Vuttipittayamongkol, P., Elyan, E.: Improved overlap-based undersampling for imbalanced dataset classification with application to epilepsy and parkinson’s disease. Int. J. Neural Syst. 30, 2050043 (2020)

    Google Scholar 

  45. Zhang, X., Zhuang, Y., Wang, W., Pedrycz, W.: Transfer boosting with synthetic instances for class imbalanced object recognition. IEEE Trans. Cybern. 48, 357 (2018)

    Google Scholar 

  46. Elyan, E., Jamieson, L., Ali-Gombe, A.: Deep learning for symbols detection and classification in engineering drawings. Neural Netw. 129, 91 (2020)

    Google Scholar 

  47. Lin, W., Wu, Z., Lin, L., Wen, A., Li, J.: An ensemble random forest algorithm for insurance big data analysis. IEEE Access 5, 16568 (2017)

    Google Scholar 

  48. Yi-Hung, L., Yen-Ting, C.: Total margin based adaptive fuzzy support vector machines for multiview face recognition. In: 2005 IEEE International Conference on Systems, Man and Cybernetics (2005)

  49. Li, Y., Sun, G., Zhu, Y.: Data imbalance problem in text classification. In: Third International Symposium on Information Processing, p. 301 (2010)

  50. Sun, Y., Wang, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit Artif. Intell. 23, 687 (2009)

    Google Scholar 

  51. Daskalaki, S., Kopanas, I., Avouris, N.: Evaluation of classifiers for an uneven class distribution problem. Appl. Artif. Intell. 20, 381 (2006)

    Google Scholar 

  52. Chen, X.W., Wasikowski, M.: Fast: a ROS-based feature selection metric for small samples and imbalanced data classification problems. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 124 (2008)

  53. Koziarski, M.: Radial-based undersampling for imbalanced data classification. Pattern Recognit. 102, 107262 (2020)

    Google Scholar 

  54. Trzcinski, T., Graczykowski, L., Glinka, M.: Using random forest classifier for particle identification in the ALICE experiment. In: Advances in Intelligent Systems and Computing (2019)

  55. Trzcinski, T., Deja, K.: Assigning quality labels in the high-energy physics experiment ALICE using machine learning algorithms. Acta Phys. Polon. Suppl. A 11, 647 (2018)

    Google Scholar 

  56. Azhari, M., Alaoui, A., Achraoui, Z., Ettaki, B., Zerouaoui, J.: Adaptation of the random forest method. Procedia Comput. Sci. 170, 1141 (2020)

    Google Scholar 

  57. Azhari, M., Alaoui, A., Abarda, A., Ettaki, B., Zerouaoui, J.: Using ensemble methods to solve the problem of pulsar search. In: Farhaoui, Y. (ed.) Big Data and Networks Technologies, Lecture Notes in Networks and Systems. Springer. ISBN 978-3030236717 (2020)

  58. Azhari, M., Alaoui, A., Abarda, A., Ettaki, B., Zerouaoui, J.: A comparison of random forest methods for solving the problem of pulsar search. In: The Fourth International Conference on Smart City Applications. Springer. ISBN 978-3030539283 (2020)

  59. Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl. Based Syst. 212, 106631 (2021)

    Google Scholar 

  60. Zhu, M., Xia, J., Yan, M., Cai, G., Yan, J., Ning, G.: Dimensionality reduction in complex medical data: improved self-adaptive niche genetic algorithm. Comput. Math. Methods Med. 2015 (2015)

  61. Xia, B., Jiang, H., Liu, H., Yi, D.: A novel hepatocellular carcinoma image classification method based on voting ranking random forests. Comput. Math. Methods Med. 2016 (2016)

  62. Zhang, C., Li, Y., Yu, Z.,Tian, F.: A weighted random forest approach to improve predictive performance for power system transient stability assessment. In: 2016 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC), p. 1259 (2016)

  63. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825 (2011)

    MathSciNet  MATH  Google Scholar 

  64. Aung, W.T., Myanmar, Y., Saw Hla, K.H.M.: Random forest classifier for multi-category classification of web pages. In: 2009 IEEE Asia-Pacific Services Computing Conference (APSCC), p. 372 (2009)

  65. Gajowniczek, K., Grzegorczyk, I., Zabkowski, T., Bajaj, C.: Weighted random forests to improve arrhythmia classification. Electronics 9, 99 (2020)

    Google Scholar 

  66. Yang, H., Li, X., Cao, H., Cui, Y., Luo, Y., Liu, J., Zhang, Y.: Using machine learning methods to predict hepatic encephalopathy in cirrhotic patients with unbalanced data. Comput. Methods Programs Biomed. 211, 106420 (2021)

    Google Scholar 

  67. Thammasiri, D., Delen, D., Meesad, P., Kasap, N.: A critical assessment of imbalanced class distribution problem: the case of predicting freshmen student attrition. Expert Syst. Appl. 41, 321 (2014)

    Google Scholar 

  68. Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 1 (2017)

    Google Scholar 

  69. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modelling under imbalanced distributions. arXiv:1505.01658

  70. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for classimbalance learning. IEEE Trans. Syst. Man Cybern. B 39, 539 (2009)

    Google Scholar 

  71. Fernández, A., García, S.R., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham. ISBN 978-3-319-98073-7 (2018)

  72. Kubat, M., Matwin, S.: Addressing the course of imbalanced training-sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, p. 179 (1997)

  73. Chawla, N.V.: Data mining for imbalanced datasets. An overview. In: Data Mining and Knowledge Discovery Handbook, p. 853. Springer US (2005)

  74. Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2, 42 (2012)

    Google Scholar 

  75. Ashraf, S., Saleem, S., Ahmed, T., Aslam, Z., Muhammad, D.: Conversion of adverse data corpus to shrewd output using sampling metrics. Visual Comput. Ind. Biomed. Art 3, 19 (2020)

    Google Scholar 

  76. Blagus, R., Lusa, L.: SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14, 106 (2013)

    Google Scholar 

  77. He, H., Ma, Y. (eds.): Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press. ISBN 978-1-118-07462-6 (2013)

  78. Cover, M.T., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21 (1967)

    MATH  Google Scholar 

  79. Wang, S., Dai, Y., Shen, J., et al.: Research on expansion and classification of imbalanced data based on SMOTE algorithm. Sci. Rep. 11, 24039 (2021)

    Google Scholar 

  80. Fernandez, A., Garcia, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61 (2018)

  81. Sevastianov, L.A., Shchetinin, E.Y.: On methods for improving the accuracy of multiclass classification on imbalanced data. Inform. Primen. 14, 63 (2020)

    Google Scholar 

  82. Mukherjee, M., Khushi, M.: SMOTE-ENC: a novel SMOTE-based method to generate synthetic data for nominal and continuous features. Appl. Syst. Innov. 4, 18 (2021)

    Google Scholar 

  83. Stanfill, C., Waltz, D.: Toward memory-based reasoning. Commun. ACM 29, 1213 (1986)

    Google Scholar 

  84. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Lecture Notes in Computer Science. Springer Berlin Heidelberg, Berlin. ISBN 9783540282266 (2005)

  85. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273 (1995)

    MATH  Google Scholar 

  86. Hussain, S., Raza, Z., Giacomini, G., Goswami, N.: Support vector machine-based classification of vasovagal syncope using head-up tilt test. Biology 10, 1029 (2021)

    Google Scholar 

  87. Wang, L.: Support Vector Machines: Theory and Applications. Springer, Berlin, Heidelberg. ISBN 978-3-540-24388-5 (2005)

  88. Evgeniou T., Pontil, M.: Support vector machines: theory and applications. In: Machine Learning and Its Applications, Advanced Lectures (2001)

  89. Wong, G.Y., Leung, F.H.F., Ling, S.H.: A hybrid evolutionary preprocessing method for imbalanced datasets. Inf. Sci. 454–455, 161–177 (2018)

    MathSciNet  Google Scholar 

  90. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. C 42, 463–484 (2012)

    Google Scholar 

  91. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Syst. 40, 185–197 (2010)

    Google Scholar 

  92. Zhang, Y., Zhang, D., Mi, G., Ma, D., Li, G., Guo, Y., Li, M., Zhu, M.: Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput. Biol. Chem. 36, 36–41 (2012)

    MathSciNet  MATH  Google Scholar 

  93. Cao L., Zhai Y.: Imbalanced data classification based on a hybrid resampling SVM method. In: 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), pp. 1533–1536 (2015)

  94. Le, T., Lee, M., Park, J., Baik, S.: Oversampling techniques for bankruptcy prediction: novel features from a transaction dataset. Symmetry 10, 79 (2018)

    Google Scholar 

  95. Le, T., Baik, S.: A robust framework for self-care problem identification for children with disability. Symmetry 11, 89 (2019)

    Google Scholar 

  96. Le, T., Vo, M.T., Vo, B., Lee, M.Y., Baik, S.W.: A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity 2019, 1 (2019)

    Google Scholar 

  97. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. SMC 2, 408 (1972)

    MathSciNet  MATH  Google Scholar 

  98. Xu, Z., Shen, D., Nie, T., Kou, Y.: A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data. J. Biomed. Inform. 107, 103465 (2020)

    Google Scholar 

  99. Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Commun. SMC 6, 769 (1976)

    MathSciNet  MATH  Google Scholar 

  100. McCauley, T.: CMS releases open data for Machine Learning. https://cms.cern/news/cms-releases-open-data-machine-learning

  101. McCauley, T.: Events with two electrons from 2010. https://opendata.cern.ch/record/304

  102. McCauley, T.: \(\Upsilon \) to two electrons from 2010. https://opendata.cern.ch/record/305

  103. Racz, A., Bajusz, D., Heberger, K.: Effect of dataset size and train/test split ratios in QSAR/QSPR multiclass classification. Molecules 26, 1111 (2021)

    Google Scholar 

  104. Probst, P., Boulestei, A.L.: To tune or not to tune the number of trees in random forest. J. Mach. Learn. Res. 18, 6673 (2017)

    MathSciNet  Google Scholar 

  105. Ozigis, M.S., Kaduk, J.D., Jarvis, C.H., Balzter, H.: Detection of oil pollution impacts on vegetation using multifrequency SAR, multispectral images with fuzzy forest and random forest methods. Environ. Pollut. 256 (2020)

  106. NA61/SHINE Collaboration: Two-particle correlations in azimuthal angle and pseudorapidity in inelastic p+p interactions at the CERN Super Proton Synchrotron. Eur. Phys. J. C 77, 59 (2017)

  107. Visa, S.: Fuzzy classifiers for imbalanced data sets. Ph.D. Thesis, Univeristy of Cincinnati: Cincinnati (2006)

  108. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429 (2002)

    MATH  Google Scholar 

  109. Wang, L.X., Mendel, J.M.: Generating fuzzy rules by learning from examples. IEEE Trans. Syst. Man Cybern. 22, 1414 (1992)

    MathSciNet  Google Scholar 

  110. Ali, A., Shamsuddin, S.M., Ralescu, A.L.: Classification with class imbalance problem: a review. Int. J. Adv. Soft Comput. Appl. 7, 176 (2015)

    Google Scholar 

  111. Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. Adv. Artif. Intell. 4304, 1015 (2006)

    Google Scholar 

  112. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45, 427 (2009)

    Google Scholar 

  113. Garcia, V., Mollineda, R.A., Sanchez, J.S.: Theoretical analysis of a performance measure for imbalanced data. In: 20th International Conference on Pattern Recognition, p. 617 (2010)

  114. Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn. Technol. 2, 37 (2011)

    Google Scholar 

  115. Vuttipittayamongkol, P., Elyan, E.: Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf. Sci. 509, 47 (2020)

    Google Scholar 

  116. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE 12 (2017)

  117. Brodersen, K.H. , Ong, C.S., Stephan, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: 20th International Conference on Pattern Recognition, p. 3121 (2010)

  118. Akosa, J.S.: Predictive accuracy: a misleading performance measure for highly imbalanced data. In: Proceedings of The SAS Global Forum 2017 Conference, p. 942 (2017)

Download references

Acknowledgements

The author acknowledges the support from the Scientific and Technological Research Council of Turkey (TUBITAK) Project No. 119F302.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serpil Yalcin Kuzu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yalcin Kuzu, S. Random Forest Based Multiclass Classification Approach for Highly Skewed Particle Data. J Sci Comput 95, 21 (2023). https://doi.org/10.1007/s10915-023-02144-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-023-02144-2

Keywords

Mathematics Subject Classification

Navigation