A parallel intelligent algorithm applied to predict students dropping out of university

Lee, Zne-Jung; Lee, Chou-Yuan

doi:10.1007/s11227-019-03093-0

A parallel intelligent algorithm applied to predict students dropping out of university

Published: 15 January 2020

Volume 76, pages 1049–1062, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Zne-Jung Lee¹ &
Chou-Yuan Lee¹

378 Accesses
5 Citations
Explore all metrics

Abstract

A student dropping out of university means that he/she quits the university early. Increasingly more students are dropping out of university, the reasons for which vary. It is an important issue for universities to predict students wanting to drop out in advance. Such information would allow them to find useful strategies to help university students and prevent them from dropping out. Compared with all students at a university, student dropping out is a relatively rare event. This represents an issue of imbalanced data. In such data, the majority of classes have more instances than do minority classes. Conventional algorithms classify the minority classes into majority classes and then ignore the minority classes. When data grow with imbalanced features, it becomes difficult to solve these problems with conventional algorithms. An algorithm is proposed to predict students dropping out of a university. In this algorithm, a parallel framework based on Apache Spark with three approaches is presented to parallel process the data on students dropping out of a university. Thereafter, the improved bacterial foraging optimization (BFO) and ensemble method are used to improve the classification execution. This technique is applied to a real scenario from a university in Taiwan. The dataset taken from the UCI machine learning repository is also used to verify the correctness of the introduced parallel intelligent algorithm. The error rate for students dropping out is 7.65% for this algorithm, which shows that the proposed algorithm surpasses the performance of the compared techniques. The outcomes of the suggested algorithm will provide useful information for decision making.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Article 11 March 2019

Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams

Article Open access 21 November 2019

Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem

Article Open access 09 March 2021

References

Blake C, Keogh E, Merz CJ (1998). UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. http://www.ics.uci.edu/mlearn/MLRepository.html. Accessed 1 June 2019
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Choi Y (2018) Student employment and persistence: evidence of effect heterogeneity of student employment on college dropout. Res High Educ 59(1):88–107
Article Google Scholar
Dekker GW (2009) Predicting students drop out: a case study. In: International conference on educational data mining-edm, Cordoba, Spain
Friedman JH (1997) On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77
Article MathSciNet Google Scholar
Fu X, Wang L, Chua KS, Chu F (2002) Training RBF neural networks on unbalanced data. In: Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02. IEEE
Ghoshal S, Chatterjee A, Mukherjee V (2009) Bio-inspired fuzzy logic based tuning of power system stabilizer. Expert Syst Appl 36(5):9281–9292
Article Google Scholar
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Article Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing. Springer
Harrison PA, Dunford R, Barton DN, Kelemen E, Martín-López B, Norton L, Czúcz B (2018) Selecting methods for ecosystem service assessment: a decision tree approach. Ecosyst Serv 29:481–498
Article Google Scholar
Hazra J, Sinha A (2008) Environmental constrained economic dispatch using bacteria foraging optimization. In: Joint International Conference on Power System Technology and IEEE Power India Conference, 2008. POWERCON 2008. IEEE
He H, Edwardo AG (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284
Article Google Scholar
Hornik K, Stinchcommbe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2:359–366
Article Google Scholar
Karau H et al (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media Inc., Sebastopol
Google Scholar
Kavitha M, Suriakala M (2017) Real time credit card fraud detection on huge imbalanced data using meta-classifiers. In: International Conference on Inventive Computing and Informatics (ICICI). IEEE
Khan MMR, Arif RB, Siddique MAB, Oishe MR (2018) Study and observation of the variation of accuracies of KNN, SVM, LMNN, ENN algorithms on eleven different datasets from UCI machine learning repository. In: 2018 4th International Conference on Electrical Engineering and Information and Communication Technology (iCEEiCT). IEEE
Kim DS, Nguyen HN, Park JS (2005) Genetic algorithm to improve SVM based network intrusion detection system. In: 19th International Conference on Advanced Information Networking and Applications, 2005. AINA 2005. IEEE
Lee U, Magistretti E, Gerla M, Bellavista P, Lió P, Lee K-W (2009) Bio-inspired multi-agent data harvesting in a proactive urban monitoring environment. Ad Hoc Netw 7(4):725–741
Article Google Scholar
Lee CY, Lee ZJ (2012) A novel algorithm applied to classify unbalanced data. Appl Soft Comput 12(8):2481–2485
Article Google Scholar
Lee ZJ (2008) An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer. Artif Intell Med 42(1):81–93
Article Google Scholar
Liao Y, Fang SC, Nuttle HL (2004) A neural network model with bounded-weights for pattern classification. Comput Oper Res 31(9):1411–1426
Article Google Scholar
Lu Y, Guo H, Feldkamp L (1998) Robust neural learning from unbalanced data samples. In: The 1998 IEEE International Joint Conference on Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. IEEE
Mathew J, Pang CK, Luo M, Leong WH (2018) Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans Neural Netw Learn Syst 29(9):4065–4076
Article Google Scholar
O’Brien RC (2018) A random forests quantile classifier for class imbalanced data. University of Miami. https://scholarlyrepository.miami.edu/oa_dissertations/2106
Padmaja TM, Dhulipalla N, Bapi RS, Krishna PR (2007) Unbalanced data classification using extreme outlier elimination and sampling techniques for fraud detection. In: International Conference on Advanced Computing and Communications, 2007. ADCOM 2007. IEEE
Panigrahi B, Pandi VR (2009) Congestion management using adaptive bacterial foraging algorithm. Energy Convers Manag 50(5):1202–1209
Article Google Scholar
Passino KM (2000) Distributed optimization and control using only a germ of intelligence. In: Proceedings of the 2000 IEEE International Symposium on Intelligent Control, 2000. IEEE
Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst 22(3):52–67
Article MathSciNet Google Scholar
Sanabila HR, Jatmiko W (2018) Ensemble learning on large scale financial imbalanced data. In: 2018 International Workshop on Big Data and Information Security (IWBIS), 2018. IEEE
Sanz JA, Bernardo D, Herrera F, Bustince H, Hagras H (2015) A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data. IEEE Trans Fuzzy Syst 23(4):973–990
Article Google Scholar
Searle SR (1987) Linear models for unbalanced data. Wiley, New York
MATH Google Scholar
Shanahan JG, Laing D (2015) Large scale distributed data science using Apache Spark. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM
Solis M, Moreira T, Gonzalez R, Fernandez T, Hernandez M (2018) Perspectives to predict dropout in university students with machine learning. In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, pp 1–6
Serpen G, Aghaei E (2018) Host-based misuse intrusion detection using PCA feature extraction and kNN classification algorithms. Intell Data Anal 22(5):1101–1114
Article Google Scholar
Tang YC, Zhang YQ, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B Cybern 39(1):281–288
Article Google Scholar
Wang J, Jean J (1993) Resolving multifont character confusion with neural networks. Pattern Recogn 26(1):175–187
Article Google Scholar
Weiss SM, Indurkhya N (1995) Rule-based machine learning methods for functional prediction. J Artif Intell Res 3:383–403
Article Google Scholar
Yang MR, Lee ZJ, Lee CY, Peng BY, Huang H (2017) An intelligent algorithm based on bacteria foraging optimization and robust fuzzy algorithm to analyze asthma data. Int J Fuzzy Syst 19(4):1–9
Article MathSciNet Google Scholar
Yang X, Song Q, Cao A (2004) Clustering nonlinearly separable and unbalanced data set. In: 2004 2nd International IEEE Conference on Intelligent Systems, vol 2, pp 491–496
Yin H, Gai K (2015) An empirical study on preprocessing high-dimensional class-imbalanced data for classification. In: 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), High Performance Computing and Communications (HPCC)
Ye D, Chen Z (2008) A rough set based minority class oriented learning algorithm for highly unbalanced data sets. In: IEEE International Conference on Granular Computing, pp 736–739
Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727
Article MathSciNet Google Scholar
Zhai J, Zhang S, Wang C (2016) The classification of imbalanced large data sets based on mapreduce and ensemble of ELM classifiers. Int J Machine Learn Cybern 8:1009–1017
Article Google Scholar
Zhang J, Bloedorn E, Rosen L, Venese D (2004) Learning rules from highly unbalanced data sets. In: Fourth IEEE International Conference on Data Mining, ICDM ‘04, vol 1–4, pp 571–574
Zhang YD, Wu LN (2009) Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network. Expert Syst Appl 36(5):8849–8854
Article Google Scholar

Download references

Acknowledgements

This research was supported by 2019 Fujian Province research Grant No. FBJG20190284. It was also supported by Fuzhou University of International Studies and Trade research Grant No. 2018KYTD-02 and FWB19003.

Author information

Authors and Affiliations

School of Technology, Fuzhou University of International Studies and Trade, Fujian, China
Zne-Jung Lee & Chou-Yuan Lee

Authors

Zne-Jung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Chou-Yuan Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zne-Jung Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, ZJ., Lee, CY. A parallel intelligent algorithm applied to predict students dropping out of university. J Supercomput 76, 1049–1062 (2020). https://doi.org/10.1007/s11227-019-03093-0

Download citation

Published: 15 January 2020
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11227-019-03093-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parallel intelligent algorithm applied to predict students dropping out of university

Abstract

Access this article

Similar content being viewed by others

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams

Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A parallel intelligent algorithm applied to predict students dropping out of university

Abstract

Access this article

Similar content being viewed by others

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network

Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams

Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation