A Random Forest-Based Self-training Algorithm for Study Status Prediction at the Program Level: minSemi-RF

Chau, Vo Thi Ngoc; Phung, Nguyen Hua

doi:10.1007/978-3-319-49397-8_19

Vo Thi Ngoc Chau¹⁷ &
Nguyen Hua Phung¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10053))

Included in the following conference series:

International Workshop on Multi-disciplinary Trends in Artificial Intelligence

706 Accesses
5 Citations

Abstract

Educational data mining aims to provide useful knowledge hidden in educational data for better educational decision making support. However, a large set of educational data is not always ready for a data mining task due to the peculiarities of the academic system as well as the data collection time. In our work, we focus on a study status prediction task at the program level where the data are collected and processed once a year in the time frame of the program of interest in an academic credit system. When there are little educational data labeled for the task, the effectiveness of the task might be affected and thus, the task should be considered in a semi-supervised learning process instead of a conventional supervised learning process to exploit a larger set of unlabeled data. In particular, we define a random forest-based self-training algorithm, named minSemi-RF, for the study status prediction task at the program level. The minSemi-RF algorithm is designed as a combination of Tri-training and Self-training styles in such a way that we turn a random forest-based self-training algorithm to be a parameter-free variant of the Tri-training algorithm. This algorithm produces a final classifier that can inherit the advantages of a random forest model. Based on the experimental results from the experiments conducted on the real data sets, our algorithm is proved to be effective and practical for early in-trouble student detection in an academic credit system as compared to some existing semi-supervised learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Academic Affairs Office, Ho Chi Minh City University of Technology, Vietnam. http://www.aao.hcmut.edu.vn. Accessed 29 June 2015
Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1988)
Google Scholar
Bayer, J., Bydzovska, H., Geryk, J., Obsivac, T., Popelinsky, L.: Predicting drop-out from social behaviour of students. In: Proceedings of the 5th International Conference on Educational Data Mining, pp. 103–109 (2012)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Dejaeger, K., Goethals, F., Giangreco, A., Mola, L., Baesens, B.: Gaining insight into student satisfaction using comprehensible data mining techniques. Eur. J. Oper. Res. 218, 548–562 (2012)
Article Google Scholar
Dong, A., Chung, F., Wang, S.: Semi-supervised classification method through oversampling and common hidden space. Inf. Sci. 349–350, 216–228 (2016)
Article Google Scholar
Koprinska, I., Stretton, J., Yacef, K.: Predicting student performance from multiple data sources. Artif. Intell. Educ. 9112, 678–681 (2015)
Article Google Scholar
Kostopoulos, G., Kotsiantis, S., Pintelas, P.: Estimating student dropout in distance higher education using semi-supervised techniques. In: Proceedings of the 19th Panhellenic Conference on Informatics, pp. 38–43 (2015)
Google Scholar
Kravvaris, D., Kermanidis, K.L., Thanou, E.: Success is hidden in the students’ data. Artif. Intell. Appl. Innovations 382, 401–410 (2012)
Article Google Scholar
Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern. Part-A: Syst. Hum. 37(6), 1088–1098 (2007)
Article Google Scholar
Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 38, 315–330 (2013)
Article Google Scholar
Peña-Ayala, A.: Educational data mining: a survey and a data mining-based analysis of recent works. Expert Syst. Appl. 41, 1432–1462 (2014)
Article Google Scholar
Romero, C., Espejo, P.G., Zafra, A., Romero, J.R., Ventura, S.: Web usage mining for predicting final marks of students that use Moodle courses. Comput. Appl. Eng. Educ. 21, 135–146 (2013)
Article Google Scholar
Saarela, M., Karkkainen, T.: Analysing student performance using sparse data of core bachelor courses. J. Educ. Data Min. 7(1), 3–32 (2015)
Google Scholar
Tanha, J., Someren, M., Afsarmanesh, H.: Semi-supervised self-training for decision tree classifier. Int. J. Mach. Learn. Cyber. 1–16 (2015). doi:10.1007/s13042-015-0328-7
Google Scholar
Taruna, S., Pandey, M.: An empirical analysis of classification techniques for predicting academic performance. In: Proceedings of the IEEE International Advance Computing Conference, pp. 523–528 (2014)
Google Scholar
Triguero, I., Garíca, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)
Article Google Scholar
Triguero, I., Garíca, S., Herrera, F.: SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans. Cybern. 45(4), 622–634 (2015)
Article Google Scholar
Weka 3, Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka. Accessed 12 Dec 2015
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Google Scholar
Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17, 1529–1541 (2005)
Article Google Scholar

Download references

Acknowledgments

This research is funded by Vietnam National University Ho Chi Minh City under grant number C2016-20-16.

Author information

Authors and Affiliations

Ho Chi Minh City University of Technology, Vietnam National University – HCMC, Ho Chi Minh City, Vietnam
Vo Thi Ngoc Chau & Nguyen Hua Phung

Authors

Vo Thi Ngoc Chau
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Hua Phung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vo Thi Ngoc Chau .

Editor information

Editors and Affiliations

Mahasarakham University , Maha Sarakham, Thailand
Chattrakul Sombattheera
Harz University of Applied Sciences , Wernigerode, Germany
Frieder Stolzenburg
University of Science and Technology , Hong Kong, China
Fangzhen Lin
Macquarie University , North Ryde, New South Wales, Australia
Abhaya Nayak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chau, V.T.N., Phung, N.H. (2016). A Random Forest-Based Self-training Algorithm for Study Status Prediction at the Program Level: minSemi-RF . In: Sombattheera, C., Stolzenburg, F., Lin, F., Nayak, A. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2016. Lecture Notes in Computer Science(), vol 10053. Springer, Cham. https://doi.org/10.1007/978-3-319-49397-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-49397-8_19
Published: 10 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49396-1
Online ISBN: 978-3-319-49397-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics