Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods

Liu, Jinyang; Yin, Chuantao; Wang, Kunyang; Guan, Minghui; Wang, Xi; Zhou, Hong

doi:10.1007/s11265-021-01739-y

Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods

Published: 09 March 2022

Volume 94, pages 1199–1211, (2022)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Jinyang Liu¹,
Chuantao Yin ORCID: orcid.org/0000-0002-0742-0804²,
Kunyang Wang²,
Minghui Guan²,
Xi Wang² &
…
Hong Zhou¹

334 Accesses
Explore all metrics

Abstract

Smart learning has been concerned in the domain of education, and how to correctly predict students’ performance is one of meaningful issues. Mature performance prediction methods are promising to be applied to education systems to make educators know more about students’ learning states and provide students with academic help in time. Based on students’ historical course results and courses’ basic information, our work proposes a student’s course results prediction model using reasonable data processing and machine learning methods, which can well achieve the goal of out-of-sample results prediction. Our model framework mainly has two parts: numeric and non-numeric features vector embedding algorithms; model optimization based on data augmentation and integration. Firstly, we generate respectively feature embedding vectors for non-numeric and numeric data ; Secondly by applying relative statistical methods we combine the previous embedding vectors to form a synthetic feature matrix, then use the Gramian Angular Field and Markov Transition Field to augment the data information and extend the feature embedding by matrix integration. To verify the validity of our model in experiments, we implement multilayer perceptron as training model to realize prediction task. Our model has quite good interpretability so that it can well be understood what happens in the model framework and how the data is processed throughout the model pipeline. This can well help us make relative adjustments and tunes according to different problems and scenarios, thus get better prediction results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Model and Algorithm for Analysis of Data to Enhance Online Learning Using Graph Mining Techniques

Article 11 April 2023

Research on the improvement effect of machine learning and neural network algorithms on the prediction of learning achievement

Article 22 July 2021

A deep learning framework for students' academic performance analysis

Article 22 November 2023

References

Noh, K. S., Ju, S. H., & Jung, J. T. (2011). An exploratory study on concept and realization conditions of smart learning. Journal of Digital Convergence, 9, 79–88.
Google Scholar
Spector, J. M. (2014). Conceptualizing the emerging field of smart learning environments. Smart learning environments, 1, 1–10.
Article Google Scholar
Gao, Y., Iqbal, S., Zhang, P., & Qiu, M. (2015). Performance and power analysis of high-density multi-gpgpu architectures: A preliminary case study. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems (pp. 66–71).
Qiu, M., Ming, Z., Li, J., Liu, S., Wang, B., & Lu, Z. (2012). Three-phase time-aware energy minimization with DVFS and unrolling for chip multiprocessors. Journal of Systems Architecture, 58, 439–445.
Article Google Scholar
Qiu, M., Ming, Z., Li, J., Liu, J., Quan, G., & Zhu, Y. (2013). Informer homed routing fault tolerance mechanism for wireless sensor networks. Journal of Systems Architecture, 59, 260–270.
Article Google Scholar
Su, H., Qiu, M., & Wang, H. (2012). Secure wireless communication system for smart grid with rechargeable electric vehicles. IEEE Communications Magazine, 50, 62–68.
Article Google Scholar
Li, J., Qiu, M., Niu, J., Gao, W., Zong, Z., & Qin, X. (2010). Feedback dynamic algorithms for preemptable job scheduling in cloud systems. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (pp. 561–564). volume 1.
Tang, X., Li, K., Qiu, M., & Sha, E. H. (2012). A hierarchical reliability-driven scheduling algorithm in grid systems. Journal of Parallel Distributed Computing, 72, 525–535.
Article Google Scholar
Gai, K., Qiu, M., Thuraisingham, B., & Tao, L. (2015). Proactive attribute-based secure data schema for mobile cloud in financial industry. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems (pp. 1332–1337).
Zhang, K., Kong, J., Qiu, M. K., & Song, G. (2005). Multimedia layout adaptation through grammatical specifications. Multimedia System, 10, 245–260.
Article Google Scholar
Tao, L., Golikov, S., Gai, K., & Qiu, M. (2015). A reusable software component for integrated syntax and semantic validation for services computing. In 2015 IEEE Symposium on Service-Oriented System Engineering (pp. 127–132).
Thakur, K., Qiu, M., Gai, K., & Ali, M. L. (2015). An investigation on cyber security threats and security models. In 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing (pp. 307–311).
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S. et al. (2001). Constrained k-means clustering with background knowledge. In Icml (pp. 577–584). volume 1.
Wang, K., Zhang, J., Li, D., Zhang, X., & Guo, T. (2008). Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096.
Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2, 849–856.
Google Scholar
Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 407–416).
Baker, S., & Pomerantz, N. (2000). Impact of learning communities on retention at a metropolitan university. Journal of College Student Retention: Research, Theory & Practice, 2, 115–126.
Article Google Scholar
Hou, H. T. (2015). Integrating cluster and sequential analysis to explore learners’ flow and behavioral patterns in a simulation game with situated-learning context for science courses: A video-based process exploration. Computers in human behavior, 48, 424–435.
Article Google Scholar
Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26, 217–222.
Article Google Scholar
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).
Beemer, J., Spoon, K., He, L., Fan, J., & Levine, R. A. (2018). Ensemble learning for estimating individualized treatment effects in student success studies. International Journal of Artificial Intelligence in Education, 28, 315–335.
Article Google Scholar
Krogh, A., & Sollich, P. (1997). Statistical mechanics of ensemble learning. Physical Review E, 55, 811.
Article Google Scholar
Woods Jr, R. H. (2002). How much communication is enough in online courses?-exploring the relationship between frequency of instructor-initiated personal email and learners’ perceptions of and participation in online learning. International Journal of Instructional Media, 29, 377.
Google Scholar
Sedhain, S., Menon, A. K., Sanner, S., & Xie, L. (2015). Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th international conference on World Wide Web (pp. 111–112).
Sun, H., Yin, C., Chen, H., Qiao, L., Ouyang, Y., & David, B. (2019). A student’s performance prediction method based on neural collaborative filtering. In 2019 IEEE International Conference on Engineering, Technology and Education (TALE) (pp. 1–8). IEEE.
Ke, F., & Xie, K. (2009). Toward deep learning for adult students in online courses. The Internet and Higher Education, 12, 136–145.
Warburton, K. (2003). Deep learning and education for sustainability. International Journal of Sustainability in Higher Education.
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T.-S. (2017). Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web (pp. 173–182).
Golub, G. H., & Reinsch, C. (1971). Singular value decomposition and least squares solutions. In Linear algebra (pp. 134–151). Springer.
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, .
Wang, Z., & Oates, T. (2015b). Imaging time-series to improve classification and imputation. arXiv preprint arXiv:1506.00327.
Wang, Z., & Oates, T. (2015a). Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In Workshops at the twenty-ninth AAAI conference on artificial intelligence. volume 1.
De Mol, C., Giannone, D., & Reichlin, L. (2008). Forecasting using a large number of predictors: Is bayesian shrinkage a valid alternative to principal components? Journal of Econometrics, 146, 318–328.
Article MathSciNet MATH Google Scholar
Kerber, M., & Sharathkumar, R. (2013). Approximate čech complex in low and high dimensions. In International Symposium on Algorithms and Computation (pp. 666–676). Springer.
Yu, D., Yao, K., Su, H., Li, G., & Seide, F. (2013). Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7893–7897). IEEE.
McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. Journal of Open Source Software, 2, 205.
Article Google Scholar
Campello, R. J., Moulavi, D., & Sander, J. (2013). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160–172). Springer.
Laszlo, M., & Mukherjee, S. (2005). Minimum spanning tree partitioning algorithm for microaggregation. IEEE Transactions on Knowledge and Data Engineering, 17, 902–911.
Article Google Scholar
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on Database Systems (TODS), 42, 1–21.
Article MathSciNet Google Scholar
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Article MathSciNet MATH Google Scholar
Tukey, J. W., & McLaughlin, D. H. (1963). Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhyā: The Indian Journal of Statistics, Series A, (pp. 331–352).
Weisberg, S. (2001). Yeo-johnson power transformations. Department of Applied Statistics, University of Minnesota. Retrieved June, 1, 2003.
Yuan, K. H., Chan, W., & Bentler, P. M. (2000). Robust transformation with applications to structural equation modelling. British Journal of Mathematical and Statistical Psychology, 53, 31–50.
Article MathSciNet Google Scholar
Fox, R., & Kapoor, M. (1968). Rates of change of eigenvalues and eigenvectors. AIAA journal, 6, 2426–2429.
Article MATH Google Scholar
Strecok, A. (1968). On the calculation of the inverse of the error function. Mathematics of Computation, 22, 144–158.
MathSciNet MATH Google Scholar
Hansen, F., & Elliott, H. (1982). Image segmentation using simple markov field models. Computer Graphics and Image Processing, 20, 101–132.
Article Google Scholar
D’Agostini, G. (1995). A multidimensional unfolding method based on bayes’ theorem. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 362, 487–498.
Article Google Scholar
Powell, M. J. (2004). Least frobenius norm updating of quadratic models that satisfy interpolation conditions. Mathematical Programming, 100, 183–215.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61977003), the New Liberal Arts Research and Reform Practice Projects of the Ministry of Education of China (No. 2021180002) and the Special subject of Higher Education Informatization Research of China Association of Higher Education (No.2020XXHD05).

Author information

Authors and Affiliations

School of Economics and Management, Beihang University, Beijing, 100191, China
Jinyang Liu & Hong Zhou
Sino-French Engineer School, Beihang University, Beijing, 100191, China
Chuantao Yin, Kunyang Wang, Minghui Guan & Xi Wang

Authors

Jinyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chuantao Yin
View author publications
You can also search for this author in PubMed Google Scholar
Kunyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Guan
View author publications
You can also search for this author in PubMed Google Scholar
Xi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuantao Yin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on: Big Data Security Track

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, J., Yin, C., Wang, K. et al. Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods. J Sign Process Syst 94, 1199–1211 (2022). https://doi.org/10.1007/s11265-021-01739-y

Download citation

Received: 20 October 2021
Revised: 05 December 2021
Accepted: 29 December 2021
Published: 09 March 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11265-021-01739-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods

Abstract

Access this article

Similar content being viewed by others

Data Model and Algorithm for Analysis of Data to Enhance Online Learning Using Graph Mining Techniques

Research on the improvement effect of machine learning and neural network algorithms on the prediction of learning achievement

A deep learning framework for students' academic performance analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Students’ Course Results Prediction Based on Data Processing and Machine Learning Methods

Abstract

Access this article

Similar content being viewed by others

Data Model and Algorithm for Analysis of Data to Enhance Online Learning Using Graph Mining Techniques

Research on the improvement effect of machine learning and neural network algorithms on the prediction of learning achievement

A deep learning framework for students' academic performance analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation