skip to main content
10.1145/3231644.3231656acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesl-at-sConference Proceedingsconference-collections
research-article

Replicating MOOC predictive models at scale

Published:26 June 2018Publication History

ABSTRACT

We present a case study in predictive model replication for student dropout in Massive Open Online Courses (MOOCs) using a large and diverse dataset (133 sessions of 28 unique courses offered by two institutions). This experiment was run on the MOOC Replication Framework (MORF), which makes it feasible to fully replicate complex machine learned models, from raw data to model evaluation. We provide an overview of the MORF platform architecture and functionality, and demonstrate its use through a case study. In this replication of [41], we contextualize and evaluate the results of the previous work using statistical tests and a more effective model evaluation scheme. We find that only some of the original findings replicate across this larger and more diverse sample of MOOCs, with others replicating significantly in the opposite direction. Our analysis also reveals results which are highly relevant to the prediction task which were not reported in the original experiment. This work demonstrates the importance of replication of predictive modeling research in MOOCs using large and diverse datasets, illuminates the challenges of doing so, and describes our freely available, open-source software framework to overcome barriers to replication.

References

  1. J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and S. Crossley. Studying MOOC completion at scale using the MOOC replication framework. In Proceedings of the International Conference on Learning Analytics and Knowledge, pages 71--78, Mar. 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and C. A. Spann. Replicating 21 findings on student success in online learning. Technology, Instruction, Cognition, and Learning. pages 313--333, 2016.Google ScholarGoogle Scholar
  3. G. Balakrishnan and D. Coetzee. Predicting student retention in massive open online courses using hidden markov models. Technical report, Univ. Calif. at Berkeley EECS Dept., 2013.Google ScholarGoogle Scholar
  4. C. Boettiger. An introduction to docker for reproducible research. Oper. Syst. Rev., 49(1):71--79, Jan. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bollen, J. T. Cacioppo, R. M. Kaplan, J. A. Krosnick, J. L. Olds, and H. Dean. Social, behavioral, and economic sciences perspectives on robust and reliable science. Technical report, NSF Subcommittee on Replicability in Science, 2015.Google ScholarGoogle Scholar
  6. S. Boyer and K. Veeramachaneni. Transfer learning for predictive models in massive open online courses. In Artificial Intelligence in Education, pages 54--63. Springer, Cham, June 2015.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. J. Brandt, H. IJzerman, A. Dijksterhuis, F. J. Farach, J. Geller, R. Giner-Sorolla, J. A. Grange, M. Perugini, J. R. Spies, and A. van 't Veer. The replication recipe: What makes for a convincing replication? J. Exp. Soc. Psych., 50:217--224, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  8. C. Brooks, C. Thompson, and S. Teasley. A time series interaction analysis method for building predictive models of learners using log data. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, pages 126--135. ACM, Mar. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cito, V. Ferme, and H. C. Gall. Using docker containers to improve reproducibility in software and web engineering research. In Web Engineering, Lecture Notes in Computer Science, pages 609--612. Springer, Cham, June 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. O. S. Collaboration. Estimating the reproducibility of psychological science. Science, 349(6251):aac4716, Aug. 2015.Google ScholarGoogle Scholar
  11. C. Collberg, T. Proebsting, G. Moraila, A. Shankaran, Z. Shi, and A. M. Warren. Measuring reproducibility in computer systems research. Technical report, Univ. Arizona Dept. of Comp. Sci., 2014.Google ScholarGoogle Scholar
  12. S. Crossley, L. Paquette, M. Dascalu, D. S. McNamara, and R. S. Baker. Combining click-stream data with NLP tools to better understand MOOC completion. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pages 6--14, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. P. Daries, J. Reich, J. Waldo, E. M. Young, J. Whittinghill, A. D. Ho, D. T. Seaton, and I. Chuang. Privacy, anonymity, and big data in the social sciences. Commun. ACM, 57(9):56--63, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Dernoncourt, C. Taylor, K. Veeramachaneni, and U. O. Reilly. Moocdb: Developing standards and systems for mooc data science. Technical report, Technical Report, MIT, 2013.Google ScholarGoogle Scholar
  15. T. G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, pages 1--15. Springer, Berlin, Heidelberg, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Donoho. 50 years of data science. In Princeton NJ, Tukey Centennial Workshop, pages 1--41, 2015.Google ScholarGoogle Scholar
  17. B. J. Evans, R. B. Baker, and T. S. Dee. Persistence patterns in massive open online courses (MOOCs). J. Higher Educ., 87(2):206--242, Mar. 2016.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Fei and D. Y. Yeung. Temporal models for predicting student dropout in massive open online courses. In Intl. Conf. on Data Mining Workshop (ICDMW), pages 256--263, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Fogarty, R. S. Baker, and S. E. Hudson. Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. In Proceedings of Graphics Interface 2005, pages 129--136, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. A. Gámez, J. L. Mateo, and J. M. Puerta. Learning bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Discov., 22(1-2):106--148, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Gardner and C. Brooks. Dropout model evaluation in MOOCs. In Proceedings of the Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18).Google ScholarGoogle Scholar
  22. J. Gardner and C. Brooks. Evaluating predictive models of student success Closing the methodological gap. The Journal of Learning Analytics, 2018. In press.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Gardner and C. Brooks. Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Gardner, C. Brooks, J. M. L. Andres, and R. Baker. MORF A framework for MOOC predictive modeling and replication at scale. 2018.Google ScholarGoogle Scholar
  25. A. Gelman and E. Loken. The garden of forking paths Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 2013.Google ScholarGoogle Scholar
  26. J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29--36, Apr. 1982.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. A. HarvardX. HarvardX-MITx Person-Course academic year 2013 De-Identified dataset, version 2.0, May 2014. Title of the publication associated with this dataset: HarvardX-MITx Person-Course Academic Year 2013 De-Identified dataset, version 2.0.Google ScholarGoogle Scholar
  28. R. F. Kizilcec and C. Brooks. Diverse big data and randomized field experiments in MOOCs. In C. Lang, G. Siemens, A. Wise, and D. Gašević, editors, Handbook of Learning Analytics, pages 211--222. Society for Learning Analytics Research, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  29. R. F. Kizilcec and S. Halawa. Attrition and achievement gaps in online learning. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale, pages 57--66, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. C. Makel and J. A. Plucker. Facts are more important than novelty: Replication in the education sciences. Educ. Res., 43(6):304--316, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  31. D. Merkel. Docker: Lightweight linux containers for consistent development and deployment. Linux J., 2014(239), Mar. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. A. Nosek, J. R. Spies, and M. Motyl. Scientific utopia: II. restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci., 7(6):615--631, Nov. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  33. T. Sinha, N. Li, P. Jermann, and P. Dillenbourg. Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners. Sept. 2014.Google ScholarGoogle Scholar
  34. V. Stodden and S. Miguez. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Journal of Open Research Software, 2(1):1--6, 2013.Google ScholarGoogle Scholar
  35. S. A. Stouffer. Adjustment during army life. Princeton University Press, 1949.Google ScholarGoogle Scholar
  36. V. Tinto. Research and practice of student retention: What next? J. Coll. Stud. Ret., 8(1):1--19, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  37. T. J. Tobin and G. M. Sugai. Using Sixth-Grade school records to predict school violence, chronic discipline problems, and high school outcomes. J. Emot. Behav. Disord., 7(1):40--53, Jan. 1999.Google ScholarGoogle ScholarCross RefCross Ref
  38. K. Veeramachaneni, U.-M. O'Reilly, and C. Taylor. Towards feature engineering at scale for data from massive open online courses. July 2014.Google ScholarGoogle Scholar
  39. J. Whitehill, K. Mohan, D. Seaton, Y. Rosen, and D. Tingley. Delving deeper into MOOC student dropout prediction. Feb. 2017.Google ScholarGoogle Scholar
  40. D. H. Wolpert. Stacked generalization. Neural Netw., 5(2):241--259, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Xing, X. Chen, J. Stein, and M. Marcinkowski. Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Comput. Human Behav., 58:119--129, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Yang, T. Sinha, D. Adamson, and C. P. Rosé. Turn on, tune in, drop out: Anticipating student dropouts in massive open online courses. In Proceedings of the 2013 NIPS Data-driven education workshop, volume 11, page 14, 2013.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    L@S '18: Proceedings of the Fifth Annual ACM Conference on Learning at Scale
    June 2018
    391 pages
    ISBN:9781450358866
    DOI:10.1145/3231644

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 26 June 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    L@S '18 Paper Acceptance Rate24of58submissions,41%Overall Acceptance Rate117of440submissions,27%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader