Student behavior in a web-based educational system: Exit intent prediction

doi:10.1016/j.engappai.2016.01.018

Engineering Applications of Artificial Intelligence

Volume 51, May 2016, Pages 136-149

https://doi.org/10.1016/j.engappai.2016.01.018 Get rights and content

Highlights

•
We proposed a classifier predicting if the student ends the session in next action.
•
Proposed classifier processes the dynamically changing data streams of logs.
•
List of most important attributes for the session end prediction is identified.
•
Analysis of students׳ behavior in a web-based educational system is provided.

Abstract

The behavior of users over the web is one of the most relevant and research topic nowadays. Not only mining the user׳s behavior in order to provide better content is popular, but the prediction of the user׳s behavior is interesting and can increase user experience. Moreover, the business clearly desires such information to improve their services. In this paper we focus to the education domain as it belongs to the most dynamically transforming areas. Web based e-learning systems are nowadays reaching still greater popularity, because of possibilities they offer to students. We analyze various sources of “e-students” feedback and discuss today׳s challenges from the logging and feedback collecting point of view. Next, we focus on the prediction of student׳s next action within an e-learning application (in the mean of “stay or leave?” question). Such information can improve students׳ attrition rate by introducing various personalized approaches. We proposed the classifier based on polynomial regression and stochastic gradient descent to learn the attributes importance. In this way we are able to process a stream of data in one single iteration and thus we are able to reflect dynamic users׳ behavior changes. Our experiments are based on the log data collected from our web-based education system ALEF during three-year period. We found that there is an extensive heterogeneity in the users׳ (student) behavior which we were able to handle by using individual weights calculated for every user.

Introduction

Every one of us is a unique person who responds differently to perceptions obtained from the environment. The task of interaction with a software and various systems can be problematic for this reason, since the systems are mainly designed to operate in a one strictly defined way, regardless of user who interact with them. Nowadays an increasing amount of web-based systems use personalization, because it allows to match the content to specific user׳s needs and preferences. This process may take many forms – it can be an adaptation of a content presented to the user, a change of a search results order, an arrangement or a change of system interface components appearance etc.

In order to provide adaptive, personalized or specifically adjusted content or service, the user needs, preferences and often attitudes have to be known and visible to the system. From this point of view, the user׳s feedback plays crucial role. As the both – users and business as well gains benefits from such a tailored content or service (user access relevant information or products in shorter time, business lowers the adverts cost and raises profits), the “obsession” on collecting user feedback and using it to improve the web increases day by day.

In business sphere, the task of the prediction whether a customer will stay or exit using particular service (e.g., do not prolong the contract) is referred as the attrition or conversion rate. Such a prediction can be computed based on the customer behavior and the feedback he/she leaves during the contract (which is usually long term). However, in this paper, we focus on the task of the learning session end prediction for specific student in the e-learning web-based system (prediction of student exit intent in the session). This represents a novel application of standard long term attrition rate task to the short term behavior, which brings new challenges and also possibilities for user behavior prediction expressed by his/her next action(s).

E-learning systems are currently very popular and millions of students learn using those (Jegatha Deborah et al., 2014). Moreover, similarly to e-learning systems the MOOCs (Massive Open Online Course) are often used to promote the universities and allows them to sell certificates to graduates, which brings a huge business potential. E-learning system typically contains various courses containing learning materials divided into logical units called learning objects (LOs), e.g., explanations, questions or the practical exercises to solve presented in various forms combing text and multimedia. This rich information source can be used to improve the e-learning based on specific students׳ characteristics and behavior.

There are many advantages of e-learning in comparison to traditional education. One of the most important is, that “e-students” can adjust the learning process to their own needs and speed, which fit them the most. Jovanovic et al. (2012) proposed the clustering method for grouping students based on their cognitive learning style. This way are users able to spend their time in e-learning effectively, because system is able to automatically adapt their learning materials with respect to their learning styles.

Another advantage of e-learning systems is the possibility to adapt the course structure, navigation or its content exactly to the needs of every student individually. The concept of the adaptation and personalization of web-based systems for the domain of e-learning was introduced by Brusilovsky (1996) and is still intensively researched nowadays. There were proposed the methods of personalized recommendation, such as hybrid approach from Klasnja-Milevic et al. (2011) which is similarly to previous approach based on students׳ learning styles, but in this case also on frequent sequences of in content learned.

The task of the session end prediction represents an interesting challenge of e-learning. Students sometimes decide to stop learning while they did not understand fully the materials. If the e-learning application would be able to predict that student will probably leave soon, it could motivate him/her to stay longer, remind him/her the learning object he/she has not studied yet or offer him/her some questions to test his/her real knowledge. In this way the system will be able to help the student to learn effectively, e.g., not miss any of topics to learn in order to better prepare for his/her exam.

Our contributions presented in this papers are:

–
An analysis of e-learning students׳ behavior and feedback types and sources.
–
Novel approach for student exit intent prediction for actual session designed for highly dynamic data in the form of data stream.

In comparison to the state-of-the-art approaches and challenges in the attrition rate prediction including e-learning domain, our proposed approach focuses on short-term behavior prediction (in the mean of one session). Proposed approach fully takes an advantage of all available user characteristics- including students׳ performance, their personalities or learning styles. Thanks to the predictor architecture (polynomial binary classifier, using the stochastic gradient descent algorithm), we are able to process students׳ actions within the system as the data stream and dynamically make predictions for actual sessions. Such a short-term prediction is not used in today׳s web-based systems, including the e-learning domain.

The rest of the paper is structured as follows. The related work and the state-of-the-art is presented in the next section. The section “E-Students” Challenges describes the current trends of e-learning and its advantages in comparison to the traditional learning approaches focusing on the student feedback. We demonstrate the most important features and the ways of collecting the feedback from students׳ actions considering our e-learning system ALEF (Adaptive LEarning Framework). In the following section we focus on one task of user behavior prediction. We describe proposed method for prediction the next user action in web-based educational system in mean if he/she stay or will leave. The section Evaluation shows the results of the proposed method used with various settings. Finally, in the section Conclusions we summarize the achieved results and discuss future work.

Section snippets

Related work

As the students׳ behavior and feedback is collected ex-post (after the action happened), the machine learning and data mining techniques have to be used in order to predict next students׳ actions. There exist two basic data mining tasks – descriptive and predictive (Kantardzic, 2011). Descriptive tasks are primarily used to discover structure, relations or patterns in mined data. There are used mainly in the unsupervised learning approaches (Grira et al., 2004). On the other hand, predictive

“E-students” challenges

One of the most dynamic transformations in recent years can be seen in the education domain. We can see that traditional form of face to face learning is being replaced by e-learning. This transformation creates a new kind of students, so-called “e-students”. Not only the students enjoy e-learning education (coursera.org users count reaches 10M¹) more and more, but the standard education institutions offers a great and still increasing

Prediction approach

In order to be able to predict if a user will leave the application in the next action or he/she will continue the session, we have to deal with several limitations. At first, the data come in a continuous stream, which eliminates the usage of batch approaches. The data stream is represented by users׳ actions (visits of learning objects realized by users), where every action is described by a set of attributes describing currently visited learning object (LO), user behavior in actual session

Evaluation of proposed classifier

To evaluate proposed classifier and to tune its features, we used several variants-global classifier (classifier using only the attributes weights trained for all observations) and also personalized “per user” (classifier using attributes weights trained for each user individually) and “per course” (classifier using attributes weights trained for each course separately). Next, we devised optimal settings as the value of learning rate $λ$ , the degree of oversampling and the weights of observations

Conclusions

E-learning systems nowadays reach still greater popularity, due to the benefits they bring to the learners. The e-learning became, these days, essential part of the traditional learning, by enhancing it by the variety of offered courses to large amount of students all over the world, personalization of learning materials to individual student׳s needs, means for collaboration during the learning process and variety of learning content (learning materials, exercises, tests, interactive videos,

Acknowledgment

This work was partially supported by Grants nos. VG 1/0774/16, VG 1/0646/15 and KEGA 009STU-4/2014 and it is partial result of the Research and Development Operational Programme for the project International center of excellence for research of intelligent and secure information-communication technologies and systems, ITMS 26240120039, co-funded by the European Regional Development Fund.

References (48)

M. Barla et al.
On the impact of adaptive test question selection for learning efficiency
Comput. Educ.
(2010)
D. Delen
A comparative analysis of machine learning techniques for student retention management
A. Klasnja-Milicevic et al.
E-Learning personalization based on hybrid recommendation strategy and learning style identification
Comput. Educ.
(2011)
Y. Levy
Comparing dropouts and persistence in e-learning courses
Comput. Educ.
(2007)
R.R. McCrae et al.
A contemplated revision of the NEO five-factor inventory
Pers. Individ. Differ.
(2004)
S. Moro et al.
A data-driven approach to predict the success of bank telemarketing
Decis. Support Syst.
(2014)
A. Pena-Ayala
Review: educational DM: a survey and a DM-based analysis of recent works
Expert Syst. Appl.
(2014)
M.S.B. PhridviRaj et al.
Data mining – past, present and future – a typical survey on data streams
Procedia Technol.
(2014)
Bayer, J., Bydzovska, H., Geryk, J., Obsívač, T., Popelinsky, L., 2012. Predicting drop-out from social behaviour of...
M. Bielikova et al.
Adaptive web-based textbook utilizing gaze data
J. Eye Mov. Res.
(2015)

M. Bielikova et al.

ALEF: from application to platform for adaptive collaborative learning

A. Bifet et al.

Moa: massive online analysis

Bottou L., 2012. Stochastic gradient descent tricks, In: Neural Networks: Tricks of the Trade. LNCS, vol. 7700,...

L. Bottou et al.

The tradeoffs of large scale learning

Adv. Neural Inf. Process. Syst.

(2008)

Brusilovsky, P., 1996. Methods and techniques of adaptive hypermedia, In: User Modeling And User-adapted Interaction....

P. Brusilovsky et al.

Adaptive Navigation Support in Educational Hypermedia: An Evaluation of the ISIS-Tutor

J. Comput. Inf. Technol.

(1998)

Brusilovsky, P., 2004. KnowledgeTree: a distributed architecture for adaptive e-learning, In: Proceedings of the 13th...

Bukralia, R., 2010. Predicting dropout in online courses: comparison of classification techniques, In: Proceedings of...

Costa Jr., P.T., McCrae, R.R., 1989. The NEO-PI/NEO-FFI manual supplement, In: Psychological Assessment Resources,...

R.M. Felder et al.

Learning and teaching styles in engineering education

Eng. Educ.

(1988)

Goldberg, J.H., Stimson, M.J., Lewenstein, M., Scott, N., Wichansky, A.M., 2002. Eye tracking in web search tasks:...

Grira, N., Crucianu, M., Boujemaa, N., 2004. Unsupervised and semi-supervised clustering: a brief survey, In: A Review...

S. Halawa et al.

Dropout prediction in MOOCs using learner activity features

Exp. Best. Pr. MOOCs

(2014)

Holohan, E., Melia, M., McMullen, D., Pahl, C., 2005. Adaptive e-learning content generation based on semantic web...

Cited by (26)

Classification Technique and its Combination with Clustering and Association Rule Mining in Educational Data Mining — A survey
2023, Engineering Applications of Artificial Intelligence
Educational data mining (EDM) is the application of data mining in the educational field. EDM is used to classify, analyze, and predict the students’ academic performance, and students’ dropout rate, as well as instructors’performance in order to improve teaching–learning process. This review article discusses the detailed analysis of 142 research articles from publication year 2010-2020 downloaded from the research databases such as IEEE, Springer, ACM, and Elsevier. Also this review article contains the current happenings related to EDM in year 2021 and 2022. In this review article, the use of classification techniques and classification techniques along with other data mining techniques such as clustering algorithm, association rule algorithms, regression techniques and ensemble techniques in EDM are presented thoroughly. The comparative study is considered for Classification Techniques; Classification and Clustering Technique; Classification ans Association Rule Mining; Classification, Clustering and Association rule mining; Classification, Regression, and Clustering; and Classification, and Ensemble. Analysis in terms of Yearwise Number of Research Articles employing Classification Techniquein EDM; Classification with other Data Mining Technique used in EDM; classifier as per Weka Tool; Classification Techniques; Clustering Techniques; Association Rule Techniques; Selecting the best Classification Technique; Classification performance metric; software used in EDM; Sampling Period; size of dataset; and data mining tools are illustrated.
From review of 142 research articles, it is noted that classification techniques are mostly used technique for analyzing students’ performance in EDM. Also classification technique along with clustering techniques are applied to predict the performance of students. It is found that Naïve Bays, Random Forest, Support vector machine and J48 are mostly considered classification techniques while in classification along with clustering techniques, K-means clustering algorithm is used with classification algorithms. The classification algorithms such as Naïve Bays, Random Forest and Support Vector Machine are noted to be the best classification algorithms after comparing various classification algorithms based on various performance parameters. Among various performance parameters, the parameters accuracy, precision, recall, f-measures and k-fold value found to be used by most of the research articles. Programming languages used to build the model in EDM for analyzing the students’ dataset from educational setting, are Java, R and Python programming languages while data mining tools considered to evaluate the performance of classification or clustering or association rule algorithms are Weka, and RapidMiner. Classification algorithms under the classifiers as per Weka tool such as Tree, Bays, Function and PMML classifier are applied in most of the research articles.
In addition to comparative analysis and analysis based on various factors, research gaps are also identified and mentioned the same in this article. Future direction for researcher working in EDM related to building the model on the dataset obtained from educational setting to predict students’ performance are discussed so that work in EDM can be carried out to improve the teaching–learning process.
Random wheel: An algorithm for early classification of student performance with confidence
2021, Engineering Applications of Artificial Intelligence
Citation Excerpt :
The application of sophisticated technologies can help educators in this regard. The educational institutes are applying various advanced technologies like artificial intelligence to improve the quality of services (Kassak et al., 2016; Rabin et al., 2019). An early prediction of student performance can help educators to provide personalised and timely guidance to the students (Cano and Leonard, 2019).
The educational data mining researchers have achieved significant efficiency in predicting student performance during the tenure of the course. However, an early prediction before course commencement is still a research challenge. Such advanced forecast can help the teachers in providing timely assistance to uplift the academic performance of a student, reduce the number of failures and performance degradations. Importantly, an additional measure of prediction confidence can be useful in this regard to decide the magnitude of the assistance required. The primary objective of this study is to predict the failure, degradation and improvement before course commencement. A real dataset containing nearly 0.6 million records is used here for this purpose. We have initially applied multiple state-of-the-art classifiers on this dataset to predict the performance in binary terms. Unfortunately, these classifiers could not perform well, and they are unable to provide the desired prediction confidence as well. We have therefore proposed a novel scalable algorithm, named random wheel, for classification. It not only works efficiently on this dataset but also works well with other benchmarked datasets. The proposed classifier provides an additional measure to indicate the prediction confidence. It, in turn, increases the acceptability of the prediction.
Use of Microsoft Excel for Data Collection and Processing to Predict Students’ Performance in EDM
2024, Journal of Engineering Education Transformations
Systematic Review and Analysis of EDM for Predicting the Academic Performance of Students
2024, Journal of The Institution of Engineers (India): Series B
A recurrent neural networks based framework for at-risk learners' early prediction and MOOC tutor's decision support
2023, Computer Applications in Engineering Education
Review of EDM for Analyzing the Performance of Students in Educational Setting
2022, 2022 6th International Conference on Computing, Communication, Control and Automation, ICCUBEA 2022

View all citing articles on Scopus

View full text

Student behavior in a web-based educational system: Exit intent prediction

Highlights

Abstract

Introduction

Section snippets

Related work

“E-students” challenges

Prediction approach

Evaluation of proposed classifier

Conclusions

Acknowledgment

Comput. Educ.

Comput. Educ.

Comput. Educ.

Pers. Individ. Differ.

Decis. Support Syst.

Expert Syst. Appl.

Procedia Technol.

Adaptive web-based textbook utilizing gaze data

J. Eye Mov. Res.

ALEF: from application to platform for adaptive collaborative learning

Moa: massive online analysis

The tradeoffs of large scale learning

Adv. Neural Inf. Process. Syst.

Adaptive Navigation Support in Educational Hypermedia: An Evaluation of the ISIS-Tutor

J. Comput. Inf. Technol.

Learning and teaching styles in engineering education

Eng. Educ.

Dropout prediction in MOOCs using learner activity features

Exp. Best. Pr. MOOCs