Skip to main content
Log in

Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Smart learning has been concerned in the domain of education, and how to correctly predict students’ performance is one of meaningful issues. Mature performance prediction methods are promising to be applied to education systems to make educators know more about students’ learning states and provide students with academic help in time. Based on students’ historical course results and courses’ basic information, our work proposes a student’s course results prediction model using reasonable data processing and machine learning methods, which can well achieve the goal of out-of-sample results prediction. Our model framework mainly has two parts: numeric and non-numeric features vector embedding algorithms; model optimization based on data augmentation and integration. Firstly, we generate respectively feature embedding vectors for non-numeric and numeric data ; Secondly by applying relative statistical methods we combine the previous embedding vectors to form a synthetic feature matrix, then use the Gramian Angular Field and Markov Transition Field to augment the data information and extend the feature embedding by matrix integration. To verify the validity of our model in experiments, we implement multilayer perceptron as training model to realize prediction task. Our model has quite good interpretability so that it can well be understood what happens in the model framework and how the data is processed throughout the model pipeline. This can well help us make relative adjustments and tunes according to different problems and scenarios, thus get better prediction results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Noh, K. S., Ju, S. H., & Jung, J. T. (2011). An exploratory study on concept and realization conditions of smart learning. Journal of Digital Convergence, 9, 79–88.

    Google Scholar 

  2. Spector, J. M. (2014). Conceptualizing the emerging field of smart learning environments. Smart learning environments, 1, 1–10.

    Article  Google Scholar 

  3. Gao, Y., Iqbal, S., Zhang, P., & Qiu, M. (2015). Performance and power analysis of high-density multi-gpgpu architectures: A preliminary case study. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems (pp. 66–71).

  4. Qiu, M., Ming, Z., Li, J., Liu, S., Wang, B., & Lu, Z. (2012). Three-phase time-aware energy minimization with DVFS and unrolling for chip multiprocessors. Journal of Systems Architecture, 58, 439–445.

    Article  Google Scholar 

  5. Qiu, M., Ming, Z., Li, J., Liu, J., Quan, G., & Zhu, Y. (2013). Informer homed routing fault tolerance mechanism for wireless sensor networks. Journal of Systems Architecture, 59, 260–270.

    Article  Google Scholar 

  6. Su, H., Qiu, M., & Wang, H. (2012). Secure wireless communication system for smart grid with rechargeable electric vehicles. IEEE Communications Magazine, 50, 62–68.

    Article  Google Scholar 

  7. Li, J., Qiu, M., Niu, J., Gao, W., Zong, Z., & Qin, X. (2010). Feedback dynamic algorithms for preemptable job scheduling in cloud systems. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (pp. 561–564). volume 1.

  8. Tang, X., Li, K., Qiu, M., & Sha, E. H. (2012). A hierarchical reliability-driven scheduling algorithm in grid systems. Journal of Parallel Distributed Computing, 72, 525–535.

    Article  Google Scholar 

  9. Gai, K., Qiu, M., Thuraisingham, B., & Tao, L. (2015). Proactive attribute-based secure data schema for mobile cloud in financial industry. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems (pp. 1332–1337).

  10. Zhang, K., Kong, J., Qiu, M. K., & Song, G. (2005). Multimedia layout adaptation through grammatical specifications. Multimedia System, 10, 245–260.

    Article  Google Scholar 

  11. Tao, L., Golikov, S., Gai, K., & Qiu, M. (2015). A reusable software component for integrated syntax and semantic validation for services computing. In 2015 IEEE Symposium on Service-Oriented System Engineering (pp. 127–132).

  12. Thakur, K., Qiu, M., Gai, K., & Ali, M. L. (2015). An investigation on cyber security threats and security models. In 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing (pp. 307–311).

  13. Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S. et al. (2001). Constrained k-means clustering with background knowledge. In Icml (pp. 577–584). volume 1.

  14. Wang, K., Zhang, J., Li, D., Zhang, X., & Guo, T. (2008). Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096.

  15. Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2, 849–856.

    Google Scholar 

  16. Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 407–416).

  17. Baker, S., & Pomerantz, N. (2000). Impact of learning communities on retention at a metropolitan university. Journal of College Student Retention: Research, Theory & Practice, 2, 115–126.

    Article  Google Scholar 

  18. Hou, H. T. (2015). Integrating cluster and sequential analysis to explore learners’ flow and behavioral patterns in a simulation game with situated-learning context for science courses: A video-based process exploration. Computers in human behavior, 48, 424–435.

    Article  Google Scholar 

  19. Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26, 217–222.

    Article  Google Scholar 

  20. Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).

  21. Beemer, J., Spoon, K., He, L., Fan, J., & Levine, R. A. (2018). Ensemble learning for estimating individualized treatment effects in student success studies. International Journal of Artificial Intelligence in Education, 28, 315–335.

    Article  Google Scholar 

  22. Krogh, A., & Sollich, P. (1997). Statistical mechanics of ensemble learning. Physical Review E, 55, 811.

    Article  Google Scholar 

  23. Woods Jr, R. H. (2002). How much communication is enough in online courses?-exploring the relationship between frequency of instructor-initiated personal email and learners’ perceptions of and participation in online learning. International Journal of Instructional Media, 29, 377.

    Google Scholar 

  24. Sedhain, S., Menon, A. K., Sanner, S., & Xie, L. (2015). Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th international conference on World Wide Web (pp. 111–112).

  25. Sun, H., Yin, C., Chen, H., Qiao, L., Ouyang, Y., & David, B. (2019). A student’s performance prediction method based on neural collaborative filtering. In 2019 IEEE International Conference on Engineering, Technology and Education (TALE) (pp. 1–8). IEEE.

  26. Ke, F., & Xie, K. (2009). Toward deep learning for adult students in online courses. The Internet and Higher Education, 12, 136–145.

  27. Warburton, K. (2003). Deep learning and education for sustainability. International Journal of Sustainability in Higher Education.

  28. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web (pp. 173–182).

  29. Golub, G. H., & Reinsch, C. (1971). Singular value decomposition and least squares solutions. In Linear algebra (pp. 134–151). Springer.

  30. McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, .

  31. Wang, Z., & Oates, T. (2015b). Imaging time-series to improve classification and imputation. arXiv preprint arXiv:1506.00327.

  32. Wang, Z., & Oates, T. (2015a). Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In Workshops at the twenty-ninth AAAI conference on artificial intelligence. volume 1.

  33. De Mol, C., Giannone, D., & Reichlin, L. (2008). Forecasting using a large number of predictors: Is bayesian shrinkage a valid alternative to principal components? Journal of Econometrics, 146, 318–328.

    Article  MathSciNet  MATH  Google Scholar 

  34. Kerber, M., & Sharathkumar, R. (2013). Approximate čech complex in low and high dimensions. In International Symposium on Algorithms and Computation (pp. 666–676). Springer.

  35. Yu, D., Yao, K., Su, H., Li, G., & Seide, F. (2013). Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7893–7897). IEEE.

  36. McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2, 205.

    Article  Google Scholar 

  37. Campello, R. J., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160–172). Springer.

  38. Laszlo, M., & Mukherjee, S. (2005). Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 17, 902–911.

    Article  Google Scholar 

  39. Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on Database Systems (TODS), 42, 1–21.

    Article  MathSciNet  Google Scholar 

  40. Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.

    Article  MathSciNet  MATH  Google Scholar 

  41. Tukey, J. W., & McLaughlin, D. H. (1963). Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhyā: The Indian Journal of Statistics, Series A, (pp. 331–352).

  42. Weisberg, S. (2001). Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June, 1, 2003.

  43. Yuan, K. H., Chan, W., & Bentler, P. M. (2000). Robust transformation with applications to structural equation modelling. British Journal of Mathematical and Statistical Psychology, 53, 31–50.

    Article  MathSciNet  Google Scholar 

  44. Fox, R., & Kapoor, M. (1968). Rates of change of eigenvalues and eigenvectors. AIAA journal, 6, 2426–2429.

    Article  MATH  Google Scholar 

  45. Strecok, A. (1968). On the calculation of the inverse of the error function. Mathematics of Computation, 22, 144–158.

    MathSciNet  MATH  Google Scholar 

  46. Hansen, F., & Elliott, H. (1982). Image segmentation using simple markov field models. Computer Graphics and Image Processing, 20, 101–132.

    Article  Google Scholar 

  47. D’Agostini, G. (1995). A multidimensional unfolding method based on bayes’ theorem. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 362, 487–498.

    Article  Google Scholar 

  48. Powell, M. J. (2004). Least frobenius norm updating of quadratic models that satisfy interpolation conditions. Mathematical Programming, 100, 183–215.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61977003), the New Liberal Arts Research and Reform Practice Projects of the Ministry of Education of China (No. 2021180002) and the Special subject of Higher Education Informatization Research of China Association of Higher Education (No.2020XXHD05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuantao Yin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on: Big Data Security Track

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, J., Yin, C., Wang, K. et al. Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods. J Sign Process Syst 94, 1199–1211 (2022). https://doi.org/10.1007/s11265-021-01739-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-021-01739-y

Keywords

Navigation