ABSTRACT
We present a case study in predictive model replication for student dropout in Massive Open Online Courses (MOOCs) using a large and diverse dataset (133 sessions of 28 unique courses offered by two institutions). This experiment was run on the MOOC Replication Framework (MORF), which makes it feasible to fully replicate complex machine learned models, from raw data to model evaluation. We provide an overview of the MORF platform architecture and functionality, and demonstrate its use through a case study. In this replication of [41], we contextualize and evaluate the results of the previous work using statistical tests and a more effective model evaluation scheme. We find that only some of the original findings replicate across this larger and more diverse sample of MOOCs, with others replicating significantly in the opposite direction. Our analysis also reveals results which are highly relevant to the prediction task which were not reported in the original experiment. This work demonstrates the importance of replication of predictive modeling research in MOOCs using large and diverse datasets, illuminates the challenges of doing so, and describes our freely available, open-source software framework to overcome barriers to replication.
- J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and S. Crossley. Studying MOOC completion at scale using the MOOC replication framework. In Proceedings of the International Conference on Learning Analytics and Knowledge, pages 71--78, Mar. 2018. Google ScholarDigital Library
- J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and C. A. Spann. Replicating 21 findings on student success in online learning. Technology, Instruction, Cognition, and Learning. pages 313--333, 2016.Google Scholar
- G. Balakrishnan and D. Coetzee. Predicting student retention in massive open online courses using hidden markov models. Technical report, Univ. Calif. at Berkeley EECS Dept., 2013.Google Scholar
- C. Boettiger. An introduction to docker for reproducible research. Oper. Syst. Rev., 49(1):71--79, Jan. 2015. Google ScholarDigital Library
- K. Bollen, J. T. Cacioppo, R. M. Kaplan, J. A. Krosnick, J. L. Olds, and H. Dean. Social, behavioral, and economic sciences perspectives on robust and reliable science. Technical report, NSF Subcommittee on Replicability in Science, 2015.Google Scholar
- S. Boyer and K. Veeramachaneni. Transfer learning for predictive models in massive open online courses. In Artificial Intelligence in Education, pages 54--63. Springer, Cham, June 2015.Google ScholarCross Ref
- M. J. Brandt, H. IJzerman, A. Dijksterhuis, F. J. Farach, J. Geller, R. Giner-Sorolla, J. A. Grange, M. Perugini, J. R. Spies, and A. van 't Veer. The replication recipe: What makes for a convincing replication? J. Exp. Soc. Psych., 50:217--224, 2014.Google ScholarCross Ref
- C. Brooks, C. Thompson, and S. Teasley. A time series interaction analysis method for building predictive models of learners using log data. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, pages 126--135. ACM, Mar. 2015. Google ScholarDigital Library
- J. Cito, V. Ferme, and H. C. Gall. Using docker containers to improve reproducibility in software and web engineering research. In Web Engineering, Lecture Notes in Computer Science, pages 609--612. Springer, Cham, June 2016. Google ScholarDigital Library
- O. S. Collaboration. Estimating the reproducibility of psychological science. Science, 349(6251):aac4716, Aug. 2015.Google Scholar
- C. Collberg, T. Proebsting, G. Moraila, A. Shankaran, Z. Shi, and A. M. Warren. Measuring reproducibility in computer systems research. Technical report, Univ. Arizona Dept. of Comp. Sci., 2014.Google Scholar
- S. Crossley, L. Paquette, M. Dascalu, D. S. McNamara, and R. S. Baker. Combining click-stream data with NLP tools to better understand MOOC completion. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pages 6--14, 2016. Google ScholarDigital Library
- J. P. Daries, J. Reich, J. Waldo, E. M. Young, J. Whittinghill, A. D. Ho, D. T. Seaton, and I. Chuang. Privacy, anonymity, and big data in the social sciences. Commun. ACM, 57(9):56--63, 2014. Google ScholarDigital Library
- F. Dernoncourt, C. Taylor, K. Veeramachaneni, and U. O. Reilly. Moocdb: Developing standards and systems for mooc data science. Technical report, Technical Report, MIT, 2013.Google Scholar
- T. G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, pages 1--15. Springer, Berlin, Heidelberg, June 2000. Google ScholarDigital Library
- D. Donoho. 50 years of data science. In Princeton NJ, Tukey Centennial Workshop, pages 1--41, 2015.Google Scholar
- B. J. Evans, R. B. Baker, and T. S. Dee. Persistence patterns in massive open online courses (MOOCs). J. Higher Educ., 87(2):206--242, Mar. 2016.Google ScholarCross Ref
- M. Fei and D. Y. Yeung. Temporal models for predicting student dropout in massive open online courses. In Intl. Conf. on Data Mining Workshop (ICDMW), pages 256--263, 2015. Google ScholarDigital Library
- J. Fogarty, R. S. Baker, and S. E. Hudson. Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. In Proceedings of Graphics Interface 2005, pages 129--136, 2005. Google ScholarDigital Library
- J. A. Gámez, J. L. Mateo, and J. M. Puerta. Learning bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Discov., 22(1-2):106--148, Jan. 2011. Google ScholarDigital Library
- J. Gardner and C. Brooks. Dropout model evaluation in MOOCs. In Proceedings of the Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18).Google Scholar
- J. Gardner and C. Brooks. Evaluating predictive models of student success Closing the methodological gap. The Journal of Learning Analytics, 2018. In press.Google ScholarCross Ref
- J. Gardner and C. Brooks. Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 2018. Google ScholarDigital Library
- J. Gardner, C. Brooks, J. M. L. Andres, and R. Baker. MORF A framework for MOOC predictive modeling and replication at scale. 2018.Google Scholar
- A. Gelman and E. Loken. The garden of forking paths Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 2013.Google Scholar
- J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29--36, Apr. 1982.Google ScholarCross Ref
- M. A. HarvardX. HarvardX-MITx Person-Course academic year 2013 De-Identified dataset, version 2.0, May 2014. Title of the publication associated with this dataset: HarvardX-MITx Person-Course Academic Year 2013 De-Identified dataset, version 2.0.Google Scholar
- R. F. Kizilcec and C. Brooks. Diverse big data and randomized field experiments in MOOCs. In C. Lang, G. Siemens, A. Wise, and D. Gašević, editors, Handbook of Learning Analytics, pages 211--222. Society for Learning Analytics Research, 2017.Google ScholarCross Ref
- R. F. Kizilcec and S. Halawa. Attrition and achievement gaps in online learning. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale, pages 57--66, 2015. Google ScholarDigital Library
- M. C. Makel and J. A. Plucker. Facts are more important than novelty: Replication in the education sciences. Educ. Res., 43(6):304--316, 2014.Google ScholarCross Ref
- D. Merkel. Docker: Lightweight linux containers for consistent development and deployment. Linux J., 2014(239), Mar. 2014. Google ScholarDigital Library
- B. A. Nosek, J. R. Spies, and M. Motyl. Scientific utopia: II. restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci., 7(6):615--631, Nov. 2012.Google ScholarCross Ref
- T. Sinha, N. Li, P. Jermann, and P. Dillenbourg. Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners. Sept. 2014.Google Scholar
- V. Stodden and S. Miguez. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Journal of Open Research Software, 2(1):1--6, 2013.Google Scholar
- S. A. Stouffer. Adjustment during army life. Princeton University Press, 1949.Google Scholar
- V. Tinto. Research and practice of student retention: What next? J. Coll. Stud. Ret., 8(1):1--19, 2006.Google ScholarCross Ref
- T. J. Tobin and G. M. Sugai. Using Sixth-Grade school records to predict school violence, chronic discipline problems, and high school outcomes. J. Emot. Behav. Disord., 7(1):40--53, Jan. 1999.Google ScholarCross Ref
- K. Veeramachaneni, U.-M. O'Reilly, and C. Taylor. Towards feature engineering at scale for data from massive open online courses. July 2014.Google Scholar
- J. Whitehill, K. Mohan, D. Seaton, Y. Rosen, and D. Tingley. Delving deeper into MOOC student dropout prediction. Feb. 2017.Google Scholar
- D. H. Wolpert. Stacked generalization. Neural Netw., 5(2):241--259, 1992. Google ScholarDigital Library
- W. Xing, X. Chen, J. Stein, and M. Marcinkowski. Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Comput. Human Behav., 58:119--129, 2016. Google ScholarDigital Library
- D. Yang, T. Sinha, D. Adamson, and C. P. Rosé. Turn on, tune in, drop out: Anticipating student dropouts in massive open online courses. In Proceedings of the 2013 NIPS Data-driven education workshop, volume 11, page 14, 2013.Google Scholar
Recommendations
Studying MOOC completion at scale using the MOOC replication framework
LAK '18: Proceedings of the 8th International Conference on Learning Analytics and KnowledgeResearch on learner behaviors and course completion within Massive Open Online Courses (MOOCs) has been mostly confined to single courses, making the findings difficult to generalize across different data sets and to assess which contexts and types of ...
Replicating for performance: case studies
ReplicationIn this chapter we take a look at the application of replication techniques for building scalable distributed systems. Unlike using replication for attaining dependability, replicating for scalability is generally characterized by higher replication ...
Perception of MOOC Pedagogical Tools and Learners' Learning Styles in MOOC Blended Teaching: a Case Study
ICEBT '19: Proceedings of the 2019 3rd International Conference on E-Education, E-Business and E-TechnologyRapid development has been achieved since the emergence of MOOC in 2008, but there are still many defects in the popularization of MOOC. Developing blended teaching by utilizing is considered to be one of effective means to overcome these shortcomings. ...
Comments