Skip to main content

A Random Forest-Based Self-training Algorithm for Study Status Prediction at the Program Level: minSemi-RF

  • Conference paper
  • First Online:
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10053))

Abstract

Educational data mining aims to provide useful knowledge hidden in educational data for better educational decision making support. However, a large set of educational data is not always ready for a data mining task due to the peculiarities of the academic system as well as the data collection time. In our work, we focus on a study status prediction task at the program level where the data are collected and processed once a year in the time frame of the program of interest in an academic credit system. When there are little educational data labeled for the task, the effectiveness of the task might be affected and thus, the task should be considered in a semi-supervised learning process instead of a conventional supervised learning process to exploit a larger set of unlabeled data. In particular, we define a random forest-based self-training algorithm, named minSemi-RF, for the study status prediction task at the program level. The minSemi-RF algorithm is designed as a combination of Tri-training and Self-training styles in such a way that we turn a random forest-based self-training algorithm to be a parameter-free variant of the Tri-training algorithm. This algorithm produces a final classifier that can inherit the advantages of a random forest model. Based on the experimental results from the experiments conducted on the real data sets, our algorithm is proved to be effective and practical for early in-trouble student detection in an academic credit system as compared to some existing semi-supervised learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Academic Affairs Office, Ho Chi Minh City University of Technology, Vietnam. http://www.aao.hcmut.edu.vn. Accessed 29 June 2015

  2. Angluin, D., Laird, P.: Learning from noisy examples. Mach. Learn. 2(4), 343–370 (1988)

    Google Scholar 

  3. Bayer, J., Bydzovska, H., Geryk, J., Obsivac, T., Popelinsky, L.: Predicting drop-out from social behaviour of students. In: Proceedings of the 5th International Conference on Educational Data Mining, pp. 103–109 (2012)

    Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dejaeger, K., Goethals, F., Giangreco, A., Mola, L., Baesens, B.: Gaining insight into student satisfaction using comprehensible data mining techniques. Eur. J. Oper. Res. 218, 548–562 (2012)

    Article  Google Scholar 

  6. Dong, A., Chung, F., Wang, S.: Semi-supervised classification method through oversampling and common hidden space. Inf. Sci. 349–350, 216–228 (2016)

    Article  Google Scholar 

  7. Koprinska, I., Stretton, J., Yacef, K.: Predicting student performance from multiple data sources. Artif. Intell. Educ. 9112, 678–681 (2015)

    Article  Google Scholar 

  8. Kostopoulos, G., Kotsiantis, S., Pintelas, P.: Estimating student dropout in distance higher education using semi-supervised techniques. In: Proceedings of the 19th Panhellenic Conference on Informatics, pp. 38–43 (2015)

    Google Scholar 

  9. Kravvaris, D., Kermanidis, K.L., Thanou, E.: Success is hidden in the students’ data. Artif. Intell. Appl. Innovations 382, 401–410 (2012)

    Article  Google Scholar 

  10. Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern. Part-A: Syst. Hum. 37(6), 1088–1098 (2007)

    Article  Google Scholar 

  11. Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 38, 315–330 (2013)

    Article  Google Scholar 

  12. Peña-Ayala, A.: Educational data mining: a survey and a data mining-based analysis of recent works. Expert Syst. Appl. 41, 1432–1462 (2014)

    Article  Google Scholar 

  13. Romero, C., Espejo, P.G., Zafra, A., Romero, J.R., Ventura, S.: Web usage mining for predicting final marks of students that use Moodle courses. Comput. Appl. Eng. Educ. 21, 135–146 (2013)

    Article  Google Scholar 

  14. Saarela, M., Karkkainen, T.: Analysing student performance using sparse data of core bachelor courses. J. Educ. Data Min. 7(1), 3–32 (2015)

    Google Scholar 

  15. Tanha, J., Someren, M., Afsarmanesh, H.: Semi-supervised self-training for decision tree classifier. Int. J. Mach. Learn. Cyber. 1–16 (2015). doi:10.1007/s13042-015-0328-7

    Google Scholar 

  16. Taruna, S., Pandey, M.: An empirical analysis of classification techniques for predicting academic performance. In: Proceedings of the IEEE International Advance Computing Conference, pp. 523–528 (2014)

    Google Scholar 

  17. Triguero, I., Garíca, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)

    Article  Google Scholar 

  18. Triguero, I., Garíca, S., Herrera, F.: SEG-SSC: a framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Trans. Cybern. 45(4), 622–634 (2015)

    Article  Google Scholar 

  19. Weka 3, Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka. Accessed 12 Dec 2015

  20. Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)

    Google Scholar 

  21. Zhou, Z.H., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. 17, 1529–1541 (2005)

    Article  Google Scholar 

Download references

Acknowledgments

This research is funded by Vietnam National University Ho Chi Minh City under grant number C2016-20-16.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vo Thi Ngoc Chau .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Chau, V.T.N., Phung, N.H. (2016). A Random Forest-Based Self-training Algorithm for Study Status Prediction at the Program Level: minSemi-RF . In: Sombattheera, C., Stolzenburg, F., Lin, F., Nayak, A. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2016. Lecture Notes in Computer Science(), vol 10053. Springer, Cham. https://doi.org/10.1007/978-3-319-49397-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49397-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49396-1

  • Online ISBN: 978-3-319-49397-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics