Skip to main content
Log in

Data analysis on social media traces for detection of “spam” and “don’t care” learners

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Classification methods are becoming more and more useful as part of the standard data analyst’s toolbox in many application domains. The specific data and domain characteristics of social media tools used in online educational contexts present the challenging problem of training high-quality classifiers that bring important insight into activity patterns of learners. Currently, standard and also very successful model for classification tasks is represented by decision trees. In this paper, we introduce a custom-designed data analysis pipeline for predicting “spam” and “don’t care” learners from eMUSE online educational environment. The trained classifiers rely on social media traces as independent variables and on final grade of the learner as dependent variables. Current analysis evaluates performed activities of learners and the similarity of two derived data models. Experiments performed on social media traces from five years and 285 learners show satisfactory classification results that may be further used in productive environment. Accurate identification of “spam” and “don’t care” users may have further a great impact on producing better classification models for the rest of the “regular” learners.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Alstete JW, Beutell NJ (2004) Performance indicators in online distance learning courses: a case study of management education. Qual Assur Educ 12(1):6–14

    Article  Google Scholar 

  2. Bakirli G, Derya B (2017) DTreeSim: a new approach to compute decision tree similarity using re-mining. Turk J Electr Eng Comput Sci 25(1):108–125

    Article  Google Scholar 

  3. Bento R, Schuster C (2003) Participation: the online challenge. In: Aggarwal A (ed) Web-based education: learning from experience. Idea Group Publishing, Hershey, pp 156–164

    Chapter  Google Scholar 

  4. Bhargava N, Sharma G, Bhargava R, Mathuria M (2013) Decision tree analysis on j48 algorithm for data mining. Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering, 3(6)

  5. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  6. Cover TM, Thomas J (2006) Elements of information theory, 2nd edn. Wiley-interscience, New York

    MATH  Google Scholar 

  7. Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: efficient detection of fake Twitter followers. Decis Support Syst 80:56–71

  8. Ferguson R, Shum SB (2012) Social learning analytics: five approaches. In: Shum SB, Gasevic D, Ferguson R (Eds.) Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK ’12). ACM, New York, NY, USA, 23–33. doi:10.1145/2330601.2330616

  9. George S (2013) Learning analytics: the emergence of a discipline. Am Behav Sci 57(10):1380–1400

    Article  Google Scholar 

  10. Giovannella C, Popescu E, Scaccia F (2013) A PCA study of learner performance indicators in a Web 2.0-based learning environment. In: Proceedings of ICALT 2013 (14th IEEE International Conference on Advanced Learning Technologies), pp 33–35

  11. Gruzd A, Haythornthwaite C, Paulin D, Absar R, Huggett M (2014) Learning analytics for the social media age. In: Proceedings of LAK’14 (Fourth International Conference on Learning Analytics and Knowledge), pp 254–256

  12. Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  13. Hamid S, Waycott J, Kurnia S, Chang S (2015) Understanding learners’ perceptions of the benefits of online social networking use for teaching and learning. Internet High Educ 26:1–9

    Article  Google Scholar 

  14. Hu X, Tang J, Liu H (2014) Online social spammer detection. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence (AAAI’14). AAAI Press 59–65

  15. Kemp S (2015) Digital, social & mobile worldwide in 2015. We are social. http://wearesocial.net/blog/2015/01/digital-social-mobile-worldwide-2015

  16. Kirschner PA, Karpinski AC (2010) Facebook and academic performance. Comput. Hum. Behav. 26(6):1237–1245

    Article  Google Scholar 

  17. Lu J, Yu CS, Liu C (2003) Learning style, learning patterns, and learning performance in a WebCT-based MIS course. Inf Manag 40:497–507

    Article  Google Scholar 

  18. Mao J (2014) Social media for learning: a mixed methods study on high school learners’ technology affordances and perspectives. Comput Hum Behav 33:213–223

    Article  Google Scholar 

  19. McCord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Proceedings of Autonomic and Trusted Computing: 8th International Conference, ATC 2011, Banff, Canada, September 2–4, pp 175–186

  20. Michinov N, Brunot S, Le Bohec O, Juhel J, Delaval M (2011) Procrastination, participation, and performance in online learning environments. Comput Educ 56(1):243–252

    Article  Google Scholar 

  21. Moore JL, Dickson-Deane C, Galyen K (2011) e-Learning, online learning, and distance learning environments: Are they the same? Internet High Educ 14(2):129–135

    Article  Google Scholar 

  22. Morris KV, Finnegan C, Sz-Shyan W (2005) Tracking learner behavior, persistence, and achievement in online courses. Internet High Educ 8(3):221–231

    Article  Google Scholar 

  23. Ntoutsi I, Alexandros K, Yannis T (2008) A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. In: Proceedings of the 2008 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics

  24. Popescu E (2015) Approaches to designing social media-based learning spaces. In: Proceedings of BCI ’15 (7th Balkan conference on informatics), article no. 40, ACM Press

  25. Popescu PS, Mihaescu MC, Popescu E, Mocanu M (2016) Using ranking and multiple linear regression to explore the impact of social media engagement on student performance. In: Advanced Learning Technologies (ICALT), 2016 IEEE 16th International Conference. IEEE, pp 250–254

  26. Popescu E (2014) Providing collaborative learning support with social media in an integrated environment. World Wide Web Internet Web Inf Syst 17(2):199–212

    Article  Google Scholar 

  27. Popescu E, Cioiu D (2011) eMUSE-integrating web 2.0 tools in a social learning environment. In: Proceedings of International Conference on Advances in Web-Based Learning-ICWL 2011: 10th, Hong Kong, China, December 8–10, pp 41–50

  28. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  29. Richardson J, Swan K (2003) Examing social presence in online courses in relation to learners’ perceived learning and satisfaction. http://hdl.handle.net/2142/18713

  30. Shum SB, Ferguson R (2012) Social learning analytics. J Educ Technol Soc 15(3):3–26

    Google Scholar 

  31. Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of The 26th Annual Computer Security Applications Conference (ACSAC ’10). ACM, New York, NY, USA, pp 1–9

  32. Wang D (2014) Analysis and detection of low quality information in social networks. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), Chicago, IL, 2014, pp 350–354

  33. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, San Francisco

    Google Scholar 

  34. Zhang D, Zhou L, Briggs RO, Nunamaker JF (2006) Instructional video in e-learning: assessing the impact of interactive video on learning effectiveness. Inf Manag 43:15–27

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS-UEFISCDI, project number PN-II-RU-TE-2014-4-2604.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marian Cristian Mihăescu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mihăescu, M.C., Popescu, P.Ş. & Popescu, E. Data analysis on social media traces for detection of “spam” and “don’t care” learners. J Supercomput 73, 4302–4323 (2017). https://doi.org/10.1007/s11227-017-2011-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2011-0

Keywords

Navigation