Abstract
One of the important purposes of data mining on the web is to reveal hidden characteristics of users including their behavior. These characteristics are often used to analyze previous user actions, his/her preferences, and also to predict the future behavior. An average user session consists of only few actions, which brings several complications for the user modeling and also for subsequent prediction tasks. Such tasks are usually researched from the long-term point of view (e.g., contract renewal or course quit). On the contrary, the short-term user modeling plays an important role in the context of web applications, where it helps to improve user experience. Its shortcoming is that it often requires rich data, which availability is rather rare. For this reason, we propose a novel user model focused on the capturing changes in the user’s behavior on the level of specific actions. The model idea is based on the enrichment of user actions by a comparison of actual user session data with previous sessions. As the model basis on generally available data sources, the approach is applicable to wide scale of existing systems. We evaluate our model by the task of session end intent prediction in the e-learning and news domain. Thanks to reflecting differences in user behavior we are able to predict the intent to end the session for particular user in the scale of his/her next couple of actions. Obtained results clearly show that the proposed model brings higher precision, accuracy and session hit ratio than baseline models.
Similar content being viewed by others
References
Cisco. Cisco visual networking index: forecast and methodology, 2016–2021 , Cisco VNI Forecast, p 17, (2017)
Kompan M, Bielikova M (2013) Context-based satisfaction modelling for personalized recommendations. In: 8th international workshop on semantic and social media adaptation and personalization (SMAP ’13), IEEE, pp 33–38
Billsus D, Pazzani MJ (2007) Adaptive news access. In: Brusilovsky P, Kobsa A, Nejdl W (eds) The adaptive web (4321). Springer, Berlin, pp 550–570
Wang W, Zhao D, Luo H, Wang X (2013) Mining user interests in web logs of an online news service based on memory model. In: IEEE 8th international conference on networking, architecture and storage, pp 151–155
Xiang L, Yuan Q, Zhao S, Chen L, Zhang X, Yang Q, Sun J (2010) Temporal recommendation on graphs via long- and short-term preference fusion. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining ’10, pp 723–731
Huang X, Yang Y, Hu Y, Shen F, Shao J (2016) Dynamic user attribute discovery on social media. In: Li F, Shim K, Zheng KL (Ed), Web technologies and applications: 18th Asia-Pacific web conference (APWeb ’16), Springer, pp 256–267
Joshi S (2014) Customer Experience Management: An Exploratory Study on the Parameters Affecting Customer Experience for Cellular Mobile Services of a Telecom Company, in Proocedia - Social and Behavioral Sciences, Volume 133. ISSN 392–399:1877–0428
Ricci F, Rokach L, Shapira B (2015) Recommender systems handbook, 2nd edn. Springer, Berlin
Bieliková M, Moravcik M (2008) Modeling the reusable content of adaptive web-based applications using an ontology. In: Wallace M, Angelides MC, Mylonas P (eds) Advances in semantic media adaptation and personalization, Springer, pp 307–327
Conati C (2004) How to evaluate models of user affect?, in affective dialogue systems. Springer, Berlin, pp 288–300
Herder E (2007) An analysis of user behavior on the web-understanding the web and its users. VDM Verlag, Saarbrcken
Kassak O, Kompan M, Bielikova M (2016) Students behavior in a web-based educational system: exit intent prediction. In: Engineering applications of artificial intelligence journal, mining the humanities: technologies and applications, vol 51, Elsevier, pp 136–149
Chen W, Niu Z, Zhao X, Li Y (2014) A hybrid recommendation algorithm adapted in e-learning environments. World Wide Web J 17(2):271–284
Mills C, Bosch N, Graesser A, D’Mello S (2014) To quit or not to quit: predicting future behavioral disengagement from reading patterns. In: Trausan-Matu S, Boyer KE, Crosby M, Panourgia K (eds) Intelligent tutoring systems. Springer, Berlin, pp 19–28
Yu J, Zhu T (2015) Combining long-term and short-term user interest for personalized hashtag recommendation. Front Comput Sci 9(4):608–622
Tseng VS, Lin KW (2006) Efficient mining and prediction of user behavior patterns in mobile web systems. Inf Softw Technol 48(6):357–369
Zhou B, Zhang B, Liu Y, Xing K (2011) User model evolution algorithm: forgetting and reenergizing user preference. In: International conference on internet of things and 4th international conference on cyber, physical and social computing, pp 444–447
Cheng Y, Qiu G, Bu J, Liu K, Han Y, Wang C, Chen C (2008) Model bloggers’ interests based on forgetting mechanism. In: 17th International conference on world wide web ’08, pp 1129–1130
Mushtaq N, Werner P, Tolle K, Zicari R (2004) Building and evaluating non-obvious user profiles for visitors of web sites, In: IEEE international conference on e-commerce technologies ’04, pp 9–15
Das A, Datar M, Garg A, Rajaram S (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th international conference on world wide web ’07, pp 271–280
Desrosiers C, Karypis G (2011) A comprehensive survey of neighborhood-based recommendation methods. In: Ricci F (ed) Recommender system handbook. Springer, Berlin, pp 107–144
Tan M, Shao P (2015) Prediction of student dropout in e-learning program through the use of machine learning method. Int J Emerg Technol Learn 10(1):11–17
Huntington P, Nicholas D, Jamali HR (2008) Website usage metrics: a reassessment of session data. Inf Process Manag 44(1):358–372
Schneider-Mizell CM, Sander LM (2008) A generalized voter model on complex networks, Technical Report
Patel P, Parmar M (2014) Improve heuristics for user session identification through web server log in web usage mining. Int J Comput Sci Inf Technol 5(3):3562–3565
Spiliopoulou M, Mobasher B, Berendt B, Nakagawa M (2003) A framework for the evaluation of session reconstruction heuristics in web-usage analysis. Inf J Comput 15(2):171–190
Gayo-Avello D (2009) A survey on session detection methods in query logs and a proposal for future evaluation. Inf Sci 179(12):1822–1843
Mihalkova L, Mooney R (2009) Learning to disambiguate search queries from short sessions. In: Proceedings of the European conference on machine learning and knowledge discovery in databases ’09, pp 111–127
Sisodia DS, Verma S (2012) Web usage pattern analysis through web logs: A review. In: Computer science and software engineering (JCSSE), pp 49–53
Sangodiah A, Balamuralithara B (2014) Holistic prediction of student attrition in higher learning institutions in Malaysia using support vector machine model. Int J Res Stud Comput Sci Eng (IJRSCSE) 1(1):29–35
Delen D (2010) A comparative analysis of machine learning techniques for student retention management. In: Decision support systems, vol 49(4), Elsevier, pp 498–506
Wojewnik P, Kaminski B, Zawisza M, Antosiewicz M (2011) Social-network influence on telecommunication customer attrition. In: Agent and multi-agent systems: technologies and applications, vol 6682, pp 64–73
Li F, Lei J, Tian Y, Punyapatthanakul S, Wang YJ (2011) Model selection strategy for customer attrition risk prediction in retail banking. In: Proceedings of the 9th Australasian data mining conference, vol 121, Australian comp. soc., Darlinghurst, Australia, pp 119–124
Piao G (2016) Towards comprehensive user modeling on the social web for personalized link recommendations. In: Proceedings of the 2016 conference on user modeling adaptation and personalization (UMAP ’16). ACM, New York, USA, pp 333–336
Vasiloudis T, Vahabi H, Kravitz R, Rashkov V (2017) Predicting session length in media streaming. In: Proceedings of KDD18, August 2018, London, the 40th international ACM SIGIR conference on research and development in information retrieval (SIGIR 17), ACM, New York, pp 977–980
Garca DL, Vellido Alcacena A, Nebot Castells MA (2007) Predictive models in churn data mining: a review. In: LSI-07-4-R, pp 1–12
Song Y, Shi X, White R, Awadallah AH (2014) Context-aware web search abandonment prediction. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval (SIGIR ’14). ACM, New York, NY, USA, pp 93–102
Diriye A, White R, Buscher G, Dumais S (2012) Leaving so soon?: understanding and predicting web search abandonment rationales. In: Proceedings of the 21st ACM international conference on information and knowledge management (CIKM ’12). ACM, New York, NY, USA, pp 1025–1034
Chuklin A, Serdyukov P (2012) Potential good abandonment prediction. In: Proceedings of the 21st international conference on world wide web (WWW ’12 Companion). ACM, New York, NY, USA, pp 485–486
Kukar-Kinney M, Close AG (2010) The determinants of consumers’ online shopping cart abandonment. J Acad Mark Sci 38(2):240–250
Williams K, Kiseleva J, Crook AC, Zitouni I, Awadallah AH, Khabsa M (2016) Detecting good abandonment in mobile search. In: Proceedings of the 25th international conference on world wide web (WWW ’16). Switzerland, pp 495–505
Witten IH, Frank E, Hall MA (2015) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann. ISBN 978-0123748560
Aly M, Hatch A, Josifovski V, Narayanan VK (2012) Web-scale user modeling for targeting. In: Proceedings of the 21st international conference on world wide web (WWW ’12 Companion), ACM, New York, USA, pp 3–12
Li B, Chow TWS, Chow TWS, Huang D (2013) A novel feature selection method and its application. J Intell Inf Syst 41(2):235–268
De Silva M, Philip A, Philip L (2015) Grammar-based feature generation for time-series prediction. SpringerBriefs in computational intelligence. Springer, Berlin
Evans JD (1996) Straightforward statistics for the behavioral sciences. Brooks/Cole Publishing, Pacific Grove
Maier A, Rodriguez-Salas D (2017) Fast and robust selection of highly-correlated features in regression problems. In: 2017 fifteenth IAPR international conference on machine vision applications (MVA), IEEE, pp 482-485
Pampn HJC, Jerbi H, O’Mahony MP (2015) Evaluating the relative performance of collaborative filtering recommender systems. J Univ Comput Sci 21(13):1849–1868
Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46(3):563–597
Bieliková M, Šimko M, Barla M, Tvarožek J, Labaj M, Móro R, Srba I, Ševcech J (2014) ALEF: from Application to Platform for Adaptive Collaborative Learning, in Recommender Systems for Technology Enhanced Learning. Springer Science and Business Media NY III:195–225
Formoso V, Fernández D, Cacheda F, Carneiro V (2015) Distributed architecture for k-nearest neighbors recommender systems. World Wide Web J 18(4):997–1017
Rich E, Kobsa A, Wahlster W (1989) Stereotypes and user modeling. In: User models in dialog systems, Springer, pp 35–51
Kompan M, Bielikova M (2013) Personalized recommendation for individual users based on the group recommendation principles. Stud Inf Control 22(3):331–341
Kim Y, Hassan A, White RW, Zitouni I (2014) Modeling dwell time to predict click-level satisfaction. In Proceedings of the 7th ACM international conference on web search and data mining (WSDM ’14). ACM, New York, pp 193–202
Acknowledgements
This work was partially supported by the Slovak Research and Development Agency under the contract No. APVV-15-0508 grant, the Scientific Grant Agency of the Slovak Republic, Grants No. VG 1/0667/18 and VG 1/0646/15, and is the partial result of the Research&Development Operational Programme for the project ITMS 26240120039 and ITMS 26240220084, co-funded by the European Regional Development Fund.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proposed model consists of two attributes types (Fig. 1):
-
1.
Comparative attributes \(A_C\)
-
2.
Descriptive attributes \(A_D\)
There are in total nine comparative attributes \(ac_1,\ldots ac_9\), which are described in Table 1. Each comparative attribute is further computed for each of the eight time layers \(tl \in TL\) and each of the two model part \(mp \in MP\). Resulting in \(9 \times 8 \times 2\) values of comparative attributes. The formal definition of nine comparative attributes (Table 1) is following.
Attribute. Let \(S_{u,n}\), \(n\in {\mathbb {N}}\) be the actual session of a user u and \(S_{u,n} \in S_u\). Then \(Act_{u,n}\) is the set of actions user performed in a session n\(S_{u,n}\).
Let \(ts_{u,n}\) be the time spent by user u in the session \(S_{u,n}\).
Let \(S_{cat,u}\) be the set of all sessions terminating after visiting a page with category cat and \(S_{u}\) set of all sessions of user u.
Let \(Act_{cat,u,n}\) be the subset of actions \(Act_{u,n}\) with category cat.
Let \(pageseq_m\) be the sequence of length m of consequent visited pages, i.e., user actions \(Act_{u,n}\). Let \(pageseq_{m,end}\) be the sequence of length m of consequent visited pages, i.e., user actions \(Act_{u,n}\) which result in session end.
Let \(catseq_{m}\) be the sequence of length m of consequent visited categories, i.e., user actions \(Act_{u,n}\) with category cat. Let \(catseq_{m,end}\) be the sequence of length m of consequent visited categories, i.e., user actions \(Act_{u,n}\) with category cat, which result in session end.
Let \(Cat_distinct\) be the set of distinct categories visited in actions \(Act_{u,n}\) within session \(S_{u,n}\)
Let \(cat_last\) be the category of the page actually visited by the user.
As described in Sect. 3, descriptive attributes are calculated for two model parts (personal and global). We defined these attributes (Attr1–Attr9) in Eqs. 5–13 from the personal part perspective. The global part attributes 1–9 are calculated in the same way, but all users not only user u is considered. In other words, all metrics for user u are altered by average of all users. For example, in personal part \(ts_{u,n}\) is defined as time spent by the user u within the session n (last session). On the contrary, in global part \(ts_{u,n}\) results in average time spent by all users within last sessions.
Following the idea of proposed model, each of descriptive attributes (Attr1–Attr9) are also calculated for several time layers. In other words, according to the specific time layer (e.g., month) all attributes are calculated only based on last month data.
Rights and permissions
About this article
Cite this article
Kompan, M., Kassak, O. & Bielikova, M. The Short-term User Modeling for Predictive Applications. J Data Semant 8, 21–37 (2019). https://doi.org/10.1007/s13740-018-0095-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-018-0095-1