Skip to main content

Random Sampling Effects on e-Learners Cluster Sizes Using Clustering Algorithms

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1228))

Included in the following conference series:

  • 1118 Accesses

Abstract

e-Learners represent an increasing audience and a target of most higher education institutions (that include universities and colleges). However, few studies focus on their progress and interactions during the learning process through various educational tools. This study looks at the assessment of e-learners for Open University Learning Analytics in the UK. This paper deploys a model-based clustering and The Bayesian Information Criterion for k-means (BIC) to support the classification (e.g. density-based clustering and k-means) and definition of the cluster size. The study found out that, during assessment, students might belong to two to three clusters. The novelty of this study lies in its initial deployment of unsupervised learning (model-based clustering) to discover the number of clusters amongst on-line learners. Secondly, the study tests the validity of clusters by applying two algorithms - k-means and density-estimate clustering - when randomly splitting a training set by 40% and 60%. This form of semi-supervised learning (that meant assigning the number of clusters to k-means algorithms and density estimate clustering) was used to test the effect of two clustering algorithms (e.g. the Density clustering and K-means algorithms) on the cluster size by fixing the sample number and the percentage when splitting the dataset. While an exception, the cluster percentage case only occurs with two clusters and when using a density estimation clustering method. Thus, both algorithms, k-means and density-based clustering using k-means, have a similar effect on the cluster sizes, with the exception of the density-based method when clustering two clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ryan, S., Scott, B., Freeman, H., Patel, D.: The Virtual University: The Internet and Resource-Based Learning. Routledge, Abingdon (2013)

    Book  Google Scholar 

  2. Sin, K., Muthu, L.: Application of big data in education data mining and learning analytics – a literature review. ICTACT J. Soft Comput. 05(04), 1035–1049 (2015)

    Article  Google Scholar 

  3. Chandra, K., Nandhini, E., Chandra, E.: Knowledge mining from student data. Eur. J. Sci. Res. 47(1), 156–163 (2010)

    Google Scholar 

  4. Campagni, R., Merlini, D., Verri, M.C.: Finding regularities in courses evaluation with K-means clustering. In: Proceedings of the 6th International Conference on Computer Supported Education, CSEDU 2014, vol. 2, pp. 26–33 (2014)

    Google Scholar 

  5. Conati, C., Gertner, A., Vanlehn, K.: Using Bayesian networks to manage uncertainty in student modeling. User Model. User-Adapted Interact. 12(4), 371–417 (2002)

    Article  Google Scholar 

  6. Kumar, V., Chadha, A.: Mining association rules in student’s assessment data. Int. J. Comput. Sci. Issues 9(5), 211–216 (2012)

    Google Scholar 

  7. Hanna, M.: Data mining in the e-learning domain. Campus-Wide Inf. Syst. 21(1), 29–34 (2004)

    Article  Google Scholar 

  8. Castro, F., Vellido, A., Nebot, À., Mugica, F.: Applying data mining techniques to e-learning problems. Stud. Comput. Intell. 62, 183–221 (2007)

    Google Scholar 

  9. Schumacher, C., Ifenthaler, D.: Computers in Human Behavior Features students really expect from learning analytics. Comput. Hum. Behav. 78, 397–407 (2018)

    Article  Google Scholar 

  10. Tempelaar, D., Rienties, B., Mittelmeier, J., Nguyen, Q.: Computers in Human Behavior Student profiling in a dispositional learning analytics application using formative assessment. Comput. Human Behav. 78, 408–420 (2018)

    Article  Google Scholar 

  11. Fernández, A., Peralta, D., Herrera, F., Benítez, J.M.: An overview of e-learning in cloud computing. In: Workshop on Learning Technology for Education in Cloud. AISC, vol. 173, pp. 35–46 (2012)

    Google Scholar 

  12. Ali, D., Naif, R.A., Rabeeh, A.A., Miltiadis, D.L., Farhat, A., Jalal, S.A.: Predicting student performance using advanced learning analytics. Pakistan Saudi Arabia, vol. C, pp. 415–421 (2017)

    Google Scholar 

  13. Ferguson, R.: The state of learning analytics in 2012: a review and future challenges. Technical report, KMI-12–01, vol. 4, p. 18, March 2012

    Google Scholar 

  14. Liñán, L.C., Pérez, Á.A.J.: Educational data mining and learning analytics: differences, similarities, and time evolution. Int. J. Educ. Technol. High. Educ. 12(3), 98–112 (2015)

    Google Scholar 

  15. Siemens, G., Baker, R.S.J.d.: Learning analytics and educational data mining. In: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge, LAK 2012, p. 252, December 2012

    Google Scholar 

  16. Baepler, P., Murdoch, C.J.: Academic analytics and data mining in higher education. Int. J. Scholarsh. Teach. Learn. 4(2), 1–9 (2010)

    Google Scholar 

  17. Desgraupes, B.: Clustering indices, vol. 1, p. 34. University of Paris Ouest-Lab Modal’X (2013)

    Google Scholar 

  18. Ferguson, R.: Learning analytics: drivers, developments and challenges. Int. J. Technol. Enhanc. Learn. 4(5/6), 304 (2012)

    Article  Google Scholar 

  19. Brohi, S.N., Pillai, T.R., Kaur, S., Kaur, H., Sukumaran, S., Asirvatham, D.: Accuracy comparison of machine learning algorithms for predictive analytics in higher education, pp. 254–261, July 2019

    Google Scholar 

  20. Kuzilek, J., Hlosta, M., Zdrahal, Z.: Data Descriptor: Open University Learning Analytics Dataset, pp. 1–8 (2017)

    Google Scholar 

  21. Donner, A.: An empirical study of cluster randomization. Int. J. Epidemiol. 11(3), 283–286 (1982)

    Article  Google Scholar 

  22. Dalton, L.A., Benalcazar, M.E., Dougherty, E.R.: Optimal clustering under uncertainty. PLoS One 13(10), 1–21 (2018)

    Article  Google Scholar 

  23. Fraley, C., Raftery, A.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)

    Article  Google Scholar 

  24. García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4(2), 89–109 (2010)

    Article  MathSciNet  Google Scholar 

  25. Fop, M., Murphy, T.B.: Variable selection methods for model-based clustering. Stat. Surv. 12, 18–65 (2018)

    Article  MathSciNet  Google Scholar 

  26. Zambelli, A.E.: A data-driven approach to estimating the number of clusters in hierarchical clustering. F1000Research 5, 1–13 (2017)

    Google Scholar 

  27. Mustaniroh, S.A., Effendi, U., Silalahi, R.L.R.: Integration K-means clustering method and elbow method for identification of the best customer profile cluster integration K-means clustering method and elbow method for identification of the best customer profile cluster. In: IOP Conference Series: Materials Science and Engineering, vol. 336, pp. 1–6 (2018)

    Google Scholar 

  28. Yan, M., Danek, M.S., Place, P.: Determining the number of clusters using the weighted gap statistic, pp. 1031–1037, December 2007

    Google Scholar 

  29. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B 62, 411–423 (2001)

    Article  MathSciNet  Google Scholar 

  30. Luna-Romera, J.M., del Martínez-Ballesteros, M., García-Gutiérrez, J.C.M., Riquelme-Santos, J.: An approach to silhouette and Dunn clustering indices applied to big data in an approach to silhouette and Dunn clustering indices applied to big data in spark, October 2016

    Google Scholar 

  31. Bradley, P.S., Fayyad, U.M., Reina, C.A.: Scaling EM (Expectation-Maximization) Clustering to Large Databases (1999)

    Google Scholar 

  32. Hlosta, M., Kocvara, J., Beran, D., Zdrahal, Z.: Visualisation of key splitting milestones to support interventions. In: Companion Proceedings 9th International Conference on Learning Analytics & Knowledge (LAK19), pp. 1–3 (2019)

    Google Scholar 

  33. Herodotou, C., Hlosta, M., Boroowa, A., Rienties, B.: Empowering online teachers through predictive learning analytics. Br. J. Educ. Technol. 50(6), 3064–3079 (2019)

    Article  Google Scholar 

  34. Herodotou, C., Rienties, B., Boroowa, A., Zdrahal, Z.: A large ‑ scale implementation of predictive learning analytics in higher education: the teachers’ role and perspective, vol. 67, no. 5. Springer US (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muna Al Fanah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al Fanah, M. (2020). Random Sampling Effects on e-Learners Cluster Sizes Using Clustering Algorithms. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Computing. SAI 2020. Advances in Intelligent Systems and Computing, vol 1228. Springer, Cham. https://doi.org/10.1007/978-3-030-52249-0_51

Download citation

Publish with us

Policies and ethics