Abstract
Classification methods are becoming more and more useful as part of the standard data analyst’s toolbox in many application domains. The specific data and domain characteristics of social media tools used in online educational contexts present the challenging problem of training high-quality classifiers that bring important insight into activity patterns of learners. Currently, standard and also very successful model for classification tasks is represented by decision trees. In this paper, we introduce a custom-designed data analysis pipeline for predicting “spam” and “don’t care” learners from eMUSE online educational environment. The trained classifiers rely on social media traces as independent variables and on final grade of the learner as dependent variables. Current analysis evaluates performed activities of learners and the similarity of two derived data models. Experiments performed on social media traces from five years and 285 learners show satisfactory classification results that may be further used in productive environment. Accurate identification of “spam” and “don’t care” users may have further a great impact on producing better classification models for the rest of the “regular” learners.



Similar content being viewed by others
References
Alstete JW, Beutell NJ (2004) Performance indicators in online distance learning courses: a case study of management education. Qual Assur Educ 12(1):6–14
Bakirli G, Derya B (2017) DTreeSim: a new approach to compute decision tree similarity using re-mining. Turk J Electr Eng Comput Sci 25(1):108–125
Bento R, Schuster C (2003) Participation: the online challenge. In: Aggarwal A (ed) Web-based education: learning from experience. Idea Group Publishing, Hershey, pp 156–164
Bhargava N, Sharma G, Bhargava R, Mathuria M (2013) Decision tree analysis on j48 algorithm for data mining. Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering, 3(6)
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cover TM, Thomas J (2006) Elements of information theory, 2nd edn. Wiley-interscience, New York
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst 80:56–71
Ferguson R, Shum SB (2012) Social learning analytics: five approaches. In: Shum SB, Gasevic D, Ferguson R (Eds.) Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ’12). ACM, New York, NY, USA, 23–33. doi:10.1145/2330601.2330616
George S (2013) Learning analytics: the emergence of a discipline. Am Behav Sci 57(10):1380–1400
Giovannella C, Popescu E, Scaccia F (2013) A PCA study of learner performance indicators in a Web 2.0-based learning environment. In: Proceedings of ICALT 2013 (14th IEEE International Conference on Advanced Learning Technologies), pp 33–35
Gruzd A, Haythornthwaite C, Paulin D, Absar R, Huggett M (2014) Learning analytics for the social media age. In: Proceedings of LAK’14 (Fourth International Conference on Learning Analytics and Knowledge), pp 254–256
Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hamid S, Waycott J, Kurnia S, Chang S (2015) Understanding learners’ perceptions of the benefits of online social networking use for teaching and learning. Internet High Educ 26:1–9
Hu X, Tang J, Liu H (2014) Online social spammer detection. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence (AAAI’14). AAAI Press 59–65
Kemp S (2015) Digital, social & mobile worldwide in 2015. We are social. http://wearesocial.net/blog/2015/01/digital-social-mobile-worldwide-2015
Kirschner PA, Karpinski AC (2010) Facebook and academic performance. Comput. Hum. Behav. 26(6):1237–1245
Lu J, Yu CS, Liu C (2003) Learning style, learning patterns, and learning performance in a WebCT-based MIS course. Inf Manag 40:497–507
Mao J (2014) Social media for learning: a mixed methods study on high school learners’ technology affordances and perspectives. Comput Hum Behav 33:213–223
McCord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Proceedings of Autonomic and Trusted Computing: 8th International Conference, ATC 2011, Banff, Canada, September 2–4, pp 175–186
Michinov N, Brunot S, Le Bohec O, Juhel J, Delaval M (2011) Procrastination, participation, and performance in online learning environments. Comput Educ 56(1):243–252
Moore JL, Dickson-Deane C, Galyen K (2011) e-Learning, online learning, and distance learning environments: Are they the same? Internet High Educ 14(2):129–135
Morris KV, Finnegan C, Sz-Shyan W (2005) Tracking learner behavior, persistence, and achievement in online courses. Internet High Educ 8(3):221–231
Ntoutsi I, Alexandros K, Yannis T (2008) A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. In: Proceedings of the 2008 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics
Popescu E (2015) Approaches to designing social media-based learning spaces. In: Proceedings of BCI ’15 (7th Balkan conference on informatics), article no. 40, ACM Press
Popescu PS, Mihaescu MC, Popescu E, Mocanu M (2016) Using ranking and multiple linear regression to explore the impact of social media engagement on student performance. In: Advanced Learning Technologies (ICALT), 2016 IEEE 16th International Conference. IEEE, pp 250–254
Popescu E (2014) Providing collaborative learning support with social media in an integrated environment. World Wide Web Internet Web Inf Syst 17(2):199–212
Popescu E, Cioiu D (2011) eMUSE-integrating web 2.0 tools in a social learning environment. In: Proceedings of International Conference on Advances in Web-Based Learning-ICWL 2011: 10th, Hong Kong, China, December 8–10, pp 41–50
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Richardson J, Swan K (2003) Examing social presence in online courses in relation to learners’ perceived learning and satisfaction. http://hdl.handle.net/2142/18713
Shum SB, Ferguson R (2012) Social learning analytics. J Educ Technol Soc 15(3):3–26
Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of The 26th Annual Computer Security Applications Conference (ACSAC ’10). ACM, New York, NY, USA, pp 1–9
Wang D (2014) Analysis and detection of low quality information in social networks. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), Chicago, IL, 2014, pp 350–354
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco
Zhang D, Zhou L, Briggs RO, Nunamaker JF (2006) Instructional video in e-learning: assessing the impact of interactive video on learning effectiveness. Inf Manag 43:15–27
Acknowledgements
This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS-UEFISCDI, project number PN-II-RU-TE-2014-4-2604.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mihăescu, M.C., Popescu, P.Ş. & Popescu, E. Data analysis on social media traces for detection of “spam” and “don’t care” learners. J Supercomput 73, 4302–4323 (2017). https://doi.org/10.1007/s11227-017-2011-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2011-0