Predicting high-risk students using Internet access logs

Zhou, Qing; Quan, Wenjun; Zhong, Yu; Xiao, Wei; Mou, Chao; Wang, Yong

doi:10.1007/s10115-017-1086-5

Predicting high-risk students using Internet access logs

Regular Paper
Published: 12 July 2017

Volume 55, pages 393–413, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Qing Zhou¹,
Wenjun Quan¹,
Yu Zhong²,
Wei Xiao²,
Chao Mou¹ &
…
Yong Wang³

951 Accesses
20 Citations
Explore all metrics

Abstract

Predicting student performance (PSP) is of great use from an educational perspective, especially for high-risk students who need timely help to complete their studies. Previous PSP studies construct prediction models mainly on data collected from questionnaires or some specific learning systems. Instead, students’ Internet access logs were used in this study to predict high-risk students. Since the raw data in log files are high-dimensional, complex and full of noise, several methods were proposed for the preprocessing of the data source. A high-dimensional feature selection framework is then designed to prepare features for the construction of a prediction model with good trade-off between computational efficiency and prediction performance. Experiments showed that the proposed prediction model can identify about 85% of high-risk students. Some online characteristics of high-risk students were also discovered, which might help student counselors and educational researchers better understand the relationship between students’ Internet use and their academic performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Academic Analytics to Predict Dropout Risk in E-Learning Courses

Evaluation of Classification Algorithms for Predicting Students’ Learning Performance Based on BookRoll Reading Logs

Design of a Predictive Model to Evaluate Academic Risk Using Data Mining

References

Abd-Alsabour N, University C, Egypt C (2014) A review on evolutionary feature selection. In: European modelling symposium. IEEE Computer Society, pp 20–26
Araque F, Roldán C, Salguero A (2009) Factors influencing university drop out rates. Comput Educ 53:563–574
Article Google Scholar
Bayer J, Bydzovska H, Geryk J, Obsivac T, Popelinsky L (2012) Predicting drop-out from social behaviour of students. In: Proceedings of the 5th international conference on educational data mining, pp 103–109
Bennett S, Maton K, Kervin L (2008) The ’digital natives’ debate: a critical review of the evidence. Br J Educ Technol 39(5):775–786
Article Google Scholar
Caruana R, Freitag D (1994) Greedy attribute selection. In: Proceedings of 11th international conference on machine learning. pp 28–36
Cheng CK, Paré DE, Collimore LM, Joordens S (2011) Assessing the effectiveness of a voluntary online discussion forum on improving students’ course performance. Comput Educ 56:253–261
Article Google Scholar
Cocea M, Weibelzahl S (2009) Log file analysis for disengagement detection in e-learning environments. User Model User-adapt Interact 19(4):341–385
Article Google Scholar
Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering-a filter solution. In: Proceedings of second international conference on data mining. pp 115–122
Dekker GW, Pechenizkiy M, Vleeshouwers JM (2009) Predicting students drop out: a case study. In: Educational data mining—Edm 2009, Cordoba, Spain, 1–3 July 2009. Proceedings of the International Conference on Educational Data Mining, pp 41–50
Eickhoff C, Teevan J, White R, Dumais S (2014) Lessons from the journey: a query log analysis of within-session learning. In: Proceedings of the seventh international conference on web search and web data mining. pp 223–232 (2014)
García-Torres M, Gómez-Vela F, Melián-Batista B, Moreno-Vega J (2016) High-dimensional feature selection via feature grouping: a variable neighborhood search approach. Inf Sci 326:102–118
Article MathSciNet Google Scholar
Grudnitski G (1997) A forecast of achievement from student profile data. J Account Educ 15(4):549–558
Article Google Scholar
Gurung B, Rutledge D (2014) Digital learners and the overlapping of their personal and educational digital engagement. Comput Educ 77:91–100
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of 17th international conference on machine learning. pp 359–366
Hämäläinen W, Vinni M (2011) Classifiers for educational data mining. Chapman & Hall/CRC, London
Google Scholar
Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc, Burlington, Massachusetts
Hunley SA, Evans JH, Delgado-Hachey M, Krise J, Rich T, Schell C (2005) Adolescent computer use and academic achievement. Adolescence 40(158):307–318
Google Scholar
Jain AK, Duin RP, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Article Google Scholar
Kim Y, Street WN, Menczer F (2000) Feature selection in unsupervised learning via evolutionary search. In: Proceedings of 6th ACM SIGKDD international conference on knowledge discovery and data mining, pp 365–369
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Article MATH Google Scholar
Kotsiantis S, Patriarcheas K, Xenos M (2010) A combinational incremental ensemble of classifiers as a technique for predicting students’ performance in distance education. Knowl Based Syst 23:529–535
Article Google Scholar
Kubat M, Matwin S (2000) Addressing the curse of imbalanced training sets: one-sided selection. In: International conference on machine learning, pp 179–186
Kubey RW, Lavin MJ, Barrows JR (2001) Internet use and collegiate academic performance decrements: early findings. J Commun 51(2):366–382
Article Google Scholar
Liang XH (2006) The analysis about the impact of online games on college students. Sci Educ Article Collects 7:28–31 (Chinese)
Google Scholar
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Article MathSciNet Google Scholar
Maloney E (2007) What web 2.0 can teach us about learning. Chron High Educ 53(18):B26
Google Scholar
Marcano-edeno A, Quintanilla-Dominguez J, Andina D (2011) Breast cancer classification applying artificial metaplasticity algorithm. Neurocomputing 74(8):1243–1250
Article Google Scholar
Minaei-Bidgoli B, Kashy DA, Kortemeyer G, Punch WF (2003) Predicting student performance: an application of data mining methods with an educational web-based system. In: Frontiers in education, 2003, Vol 1. FIE, pp T2A–13–18 (2003)
Mitchell A, Savill-Smith C (2004) The use of computer and video games for learning: a review of the literature. In: Fancett M (ed) Learning and skills development agency. London
Mysirlaki S, Paraskeva F (2007) Digital games: Developing the issues of socio-cognitive learning theory in an attempt to shift an entertainment gadget to an educational tool. In: Proceedings of the first IEEE international workshop on digital game and intelligent toy enhanced learning, pp 147–151
Nelder J, Wedderburn R (1995) Generalized linear models. J R Stat Soc 135(2):370–384
Google Scholar
Ortega JL, Aguillo I (2010) Differences between web sessions according to the origin of their visits. J Informetr 4(3):331–337. doi:10.1016/j.joi.2010.02.001
Article Google Scholar
Peña-Ayala A (2014) Educational data mining: a survey and a data mining-based analysis of recent works. Expert Syst Appl 41:1432–1462
Article Google Scholar
Romero C, Espejo PG, Zafra A, Romero JR, Ventura S (2013) Web usage mining for predicting final marks of students that use moodle courses. Comput Appl Eng Educ 21(1):135–146. doi:10.1002/cae.20456
Article Google Scholar
Romero C, Ventura S (2013) Data mining in education. WIREs Data Min Knowl Discov 3:12–27
Article Google Scholar
Romero C, Ventura S, Espejo PG, Hervás C (2008) Data mining algorithms to classify students. In: Proceedings of educational data mining. pp 20–21
Romero C, Ventura S, García E (2008) Data mining in course management systems: Moodle case study and tutorial. Comput Educ 51:368–384
Article Google Scholar
Sánchez RA, Cortijo V, Javed U (2014) Students perceptions of facebook for academic purposes. Comput Educ 70:138–149
Article Google Scholar
Sikora M (2011) Induction and pruning of classification rules for prediction of microseismic hazards in coal mines. Expert Syst Appl 38(6):6748–6758
Article Google Scholar
Thai-Nghe N, Drumond L, Krohn-Grimberghe A, Schmidt-Thieme L (2010) Recommender system for predicting student performance. Procedia Comput Sci 1(2):2811–2819
Article Google Scholar
Ting SR (2001) Predicting academic success of first-year engineering students from standardized test scores and psychosocial variables. Int J Eng Educ 17(1):75–80
Google Scholar
Usman NH, Alavi M, Shafeq SM (2014) Relationship between internet addiction and academic performance among foreign undergraduate students. Procedia Soc Behav Sci 114:845–851
Article Google Scholar
Vandamme J, Meskens N, Superby J (2007) Predicting academic performance by data mining methods. Educ Econ 15(4):405–419
Article Google Scholar
Xenos M (2004) Prediction and assessment of student behaviour in open and distance education in computers using bayesian networks. Comput Educ 43(4):345–359
Article Google Scholar
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evolut Comput 20(4):606–626
Young KS (2004) Internet addiction: a new clinical phenomenon and its consequences. Am Behav Sci 48(4):402–415
Article Google Scholar
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of 20th international conference on machine learning, pp 856–863
Yuan XL, Li XD, Ji Y, Wang Z (2015) An empirical study on the relationship between college students’ internet use and their academic performance. Chin J ICT Educ 10:28–30 (Chinese)
Google Scholar
Zafra A, Romero C, Ventura S (2013) Dral: a tool for discovering relevant e-activities for learners. Knowl Inf Syst 36(1):211–250
Article Google Scholar
Zhou Q, Mou C, Zheng Y, Meng Y (2014) Predicting student performance from access records on general websites. In: Proceedings of the 4th international conference on electronics. Communications and Networks, Beijing (2014)

Download references

Acknowledgements

This research was supported by National Natural Science Foundation of China under Grant No. 61472464, National Natural Science Foundation Project of CQ CSTC (No. cstc2016jcyjA0276) and Fundamental Research Funds for the Central Universities (Nos. 106112016CDJSK04XK09 and 106112016CDJXY180006).

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing, 400044, China
Qing Zhou, Wenjun Quan & Chao Mou
College of Foreign Languages and Cultures, Chongqing University, Chongqing, 400044, China
Yu Zhong & Wei Xiao
College of Economics and Management, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Yong Wang

Authors

Qing Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wenjun Quan
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Chao Mou
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenjun Quan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Q., Quan, W., Zhong, Y. et al. Predicting high-risk students using Internet access logs. Knowl Inf Syst 55, 393–413 (2018). https://doi.org/10.1007/s10115-017-1086-5

Download citation

Received: 08 November 2016
Revised: 26 April 2017
Accepted: 05 July 2017
Published: 12 July 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10115-017-1086-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting high-risk students using Internet access logs

Abstract

Access this article

Similar content being viewed by others

Using Academic Analytics to Predict Dropout Risk in E-Learning Courses

Evaluation of Classification Algorithms for Predicting Students’ Learning Performance Based on BookRoll Reading Logs

Design of a Predictive Model to Evaluate Academic Risk Using Data Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting high-risk students using Internet access logs

Abstract

Access this article

Similar content being viewed by others

Using Academic Analytics to Predict Dropout Risk in E-Learning Courses

Evaluation of Classification Algorithms for Predicting Students’ Learning Performance Based on BookRoll Reading Logs

Design of a Predictive Model to Evaluate Academic Risk Using Data Mining

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation