research-article

Replicating MOOC predictive models at scale

Authors:
Josh Gardner

University of Michigan

University of Michigan
View Profile

,
Christopher Brooks

University of Michigan

University of Michigan
View Profile

,
Juan Miguel Andres

University of Pennsylvania

University of Pennsylvania
View Profile

,
Ryan Baker

University of Pennsylvania

University of Pennsylvania
View Profile

L@S '18: Proceedings of the Fifth Annual ACM Conference on Learning at ScaleJune 2018Article No.: 1Pages 1–10https://doi.org/10.1145/3231644.3231656

Published:26 June 2018Publication History

L@S '18: Proceedings of the Fifth Annual ACM Conference on Learning at Scale

Pages 1–10

ABSTRACT

We present a case study in predictive model replication for student dropout in Massive Open Online Courses (MOOCs) using a large and diverse dataset (133 sessions of 28 unique courses offered by two institutions). This experiment was run on the MOOC Replication Framework (MORF), which makes it feasible to fully replicate complex machine learned models, from raw data to model evaluation. We provide an overview of the MORF platform architecture and functionality, and demonstrate its use through a case study. In this replication of [41], we contextualize and evaluate the results of the previous work using statistical tests and a more effective model evaluation scheme. We find that only some of the original findings replicate across this larger and more diverse sample of MOOCs, with others replicating significantly in the opposite direction. Our analysis also reveals results which are highly relevant to the prediction task which were not reported in the original experiment. This work demonstrates the importance of replication of predictive modeling research in MOOCs using large and diverse datasets, illuminates the challenges of doing so, and describes our freely available, open-source software framework to overcome barriers to replication.

References

J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and S. Crossley. Studying MOOC completion at scale using the MOOC replication framework. In Proceedings of the International Conference on Learning Analytics and Knowledge, pages 71--78, Mar. 2018. Google ScholarDigital Library
J. M. L. Andres, R. S. Baker, G. Siemens, D. Gašević, and C. A. Spann. Replicating 21 findings on student success in online learning. Technology, Instruction, Cognition, and Learning. pages 313--333, 2016.Google Scholar
G. Balakrishnan and D. Coetzee. Predicting student retention in massive open online courses using hidden markov models. Technical report, Univ. Calif. at Berkeley EECS Dept., 2013.Google Scholar
C. Boettiger. An introduction to docker for reproducible research. Oper. Syst. Rev., 49(1):71--79, Jan. 2015. Google ScholarDigital Library
K. Bollen, J. T. Cacioppo, R. M. Kaplan, J. A. Krosnick, J. L. Olds, and H. Dean. Social, behavioral, and economic sciences perspectives on robust and reliable science. Technical report, NSF Subcommittee on Replicability in Science, 2015.Google Scholar
S. Boyer and K. Veeramachaneni. Transfer learning for predictive models in massive open online courses. In Artificial Intelligence in Education, pages 54--63. Springer, Cham, June 2015.Google ScholarCross Ref
M. J. Brandt, H. IJzerman, A. Dijksterhuis, F. J. Farach, J. Geller, R. Giner-Sorolla, J. A. Grange, M. Perugini, J. R. Spies, and A. van 't Veer. The replication recipe: What makes for a convincing replication? J. Exp. Soc. Psych., 50:217--224, 2014.Google ScholarCross Ref
C. Brooks, C. Thompson, and S. Teasley. A time series interaction analysis method for building predictive models of learners using log data. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, pages 126--135. ACM, Mar. 2015. Google ScholarDigital Library
J. Cito, V. Ferme, and H. C. Gall. Using docker containers to improve reproducibility in software and web engineering research. In Web Engineering, Lecture Notes in Computer Science, pages 609--612. Springer, Cham, June 2016. Google ScholarDigital Library
O. S. Collaboration. Estimating the reproducibility of psychological science. Science, 349(6251):aac4716, Aug. 2015.Google Scholar
C. Collberg, T. Proebsting, G. Moraila, A. Shankaran, Z. Shi, and A. M. Warren. Measuring reproducibility in computer systems research. Technical report, Univ. Arizona Dept. of Comp. Sci., 2014.Google Scholar
S. Crossley, L. Paquette, M. Dascalu, D. S. McNamara, and R. S. Baker. Combining click-stream data with NLP tools to better understand MOOC completion. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, pages 6--14, 2016. Google ScholarDigital Library
J. P. Daries, J. Reich, J. Waldo, E. M. Young, J. Whittinghill, A. D. Ho, D. T. Seaton, and I. Chuang. Privacy, anonymity, and big data in the social sciences. Commun. ACM, 57(9):56--63, 2014. Google ScholarDigital Library
F. Dernoncourt, C. Taylor, K. Veeramachaneni, and U. O. Reilly. Moocdb: Developing standards and systems for mooc data science. Technical report, Technical Report, MIT, 2013.Google Scholar
T. G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, pages 1--15. Springer, Berlin, Heidelberg, June 2000. Google ScholarDigital Library
D. Donoho. 50 years of data science. In Princeton NJ, Tukey Centennial Workshop, pages 1--41, 2015.Google Scholar
B. J. Evans, R. B. Baker, and T. S. Dee. Persistence patterns in massive open online courses (MOOCs). J. Higher Educ., 87(2):206--242, Mar. 2016.Google ScholarCross Ref
M. Fei and D. Y. Yeung. Temporal models for predicting student dropout in massive open online courses. In Intl. Conf. on Data Mining Workshop (ICDMW), pages 256--263, 2015. Google ScholarDigital Library
J. Fogarty, R. S. Baker, and S. E. Hudson. Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction. In Proceedings of Graphics Interface 2005, pages 129--136, 2005. Google ScholarDigital Library
J. A. Gámez, J. L. Mateo, and J. M. Puerta. Learning bayesian networks by hill climbing: efficient methods based on progressive restriction of the neighborhood. Data Min. Knowl. Discov., 22(1-2):106--148, Jan. 2011. Google ScholarDigital Library
J. Gardner and C. Brooks. Dropout model evaluation in MOOCs. In Proceedings of the Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18).Google Scholar
J. Gardner and C. Brooks. Evaluating predictive models of student success Closing the methodological gap. The Journal of Learning Analytics, 2018. In press.Google ScholarCross Ref
J. Gardner and C. Brooks. Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 2018. Google ScholarDigital Library
J. Gardner, C. Brooks, J. M. L. Andres, and R. Baker. MORF A framework for MOOC predictive modeling and replication at scale. 2018.Google Scholar
A. Gelman and E. Loken. The garden of forking paths Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University, 2013.Google Scholar
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29--36, Apr. 1982.Google ScholarCross Ref
M. A. HarvardX. HarvardX-MITx Person-Course academic year 2013 De-Identified dataset, version 2.0, May 2014. Title of the publication associated with this dataset: HarvardX-MITx Person-Course Academic Year 2013 De-Identified dataset, version 2.0.Google Scholar
R. F. Kizilcec and C. Brooks. Diverse big data and randomized field experiments in MOOCs. In C. Lang, G. Siemens, A. Wise, and D. Gašević, editors, Handbook of Learning Analytics, pages 211--222. Society for Learning Analytics Research, 2017.Google ScholarCross Ref
R. F. Kizilcec and S. Halawa. Attrition and achievement gaps in online learning. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale, pages 57--66, 2015. Google ScholarDigital Library
M. C. Makel and J. A. Plucker. Facts are more important than novelty: Replication in the education sciences. Educ. Res., 43(6):304--316, 2014.Google ScholarCross Ref
D. Merkel. Docker: Lightweight linux containers for consistent development and deployment. Linux J., 2014(239), Mar. 2014. Google ScholarDigital Library
B. A. Nosek, J. R. Spies, and M. Motyl. Scientific utopia: II. restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci., 7(6):615--631, Nov. 2012.Google ScholarCross Ref
T. Sinha, N. Li, P. Jermann, and P. Dillenbourg. Capturing "attrition intensifying" structural traits from didactic interaction sequences of MOOC learners. Sept. 2014.Google Scholar
V. Stodden and S. Miguez. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Journal of Open Research Software, 2(1):1--6, 2013.Google Scholar
S. A. Stouffer. Adjustment during army life. Princeton University Press, 1949.Google Scholar
V. Tinto. Research and practice of student retention: What next? J. Coll. Stud. Ret., 8(1):1--19, 2006.Google ScholarCross Ref
T. J. Tobin and G. M. Sugai. Using Sixth-Grade school records to predict school violence, chronic discipline problems, and high school outcomes. J. Emot. Behav. Disord., 7(1):40--53, Jan. 1999.Google ScholarCross Ref
K. Veeramachaneni, U.-M. O'Reilly, and C. Taylor. Towards feature engineering at scale for data from massive open online courses. July 2014.Google Scholar
J. Whitehill, K. Mohan, D. Seaton, Y. Rosen, and D. Tingley. Delving deeper into MOOC student dropout prediction. Feb. 2017.Google Scholar
D. H. Wolpert. Stacked generalization. Neural Netw., 5(2):241--259, 1992. Google ScholarDigital Library
W. Xing, X. Chen, J. Stein, and M. Marcinkowski. Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization. Comput. Human Behav., 58:119--129, 2016. Google ScholarDigital Library
D. Yang, T. Sinha, D. Adamson, and C. P. Rosé. Turn on, tune in, drop out: Anticipating student dropouts in massive open online courses. In Proceedings of the 2013 NIPS Data-driven education workshop, volume 11, page 14, 2013.Google Scholar

Recommendations

Studying MOOC completion at scale using the MOOC replication framework
LAK '18: Proceedings of the 8th International Conference on Learning Analytics and Knowledge

Research on learner behaviors and course completion within Massive Open Online Courses (MOOCs) has been mostly confined to single courses, making the findings difficult to generalize across different data sets and to assess which contexts and types of ...
Read More
Replicating for performance: case studies
Replication

In this chapter we take a look at the application of replication techniques for building scalable distributed systems. Unlike using replication for attaining dependability, replicating for scalability is generally characterized by higher replication ...
Read More
Perception of MOOC Pedagogical Tools and Learners' Learning Styles in MOOC Blended Teaching: a Case Study
ICEBT '19: Proceedings of the 2019 3rd International Conference on E-Education, E-Business and E-Technology

Rapid development has been achieved since the emergence of MOOC in 2008, but there are still many defects in the popularization of MOOC. Developing blended teaching by utilizing is considered to be one of effective means to overcome these shortcomings. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
L@S '18: Proceedings of the Fifth Annual ACM Conference on Learning at Scale
June 2018
391 pages
ISBN:9781450358866
DOI:10.1145/3231644
Conference Chair:
Rose Luckin
UCL Institute of Education, UK
,
Program Chairs:
Scott Klemmer
University of California at San Diego
,
Kenneth Koedinger
Carnegie Mellon University
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
L@S '18 Paper Acceptance Rate24of58submissions,41%Overall Acceptance Rate117of440submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 607
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Replicating MOOC predictive models at scale

L@S '18: Proceedings of the Fifth Annual ACM Conference on Learning at Scale

ABSTRACT

References

Cited By

Recommendations

Studying MOOC completion at scale using the MOOC replication framework

Replicating for performance: case studies

Perception of MOOC Pedagogical Tools and Learners' Learning Styles in MOOC Blended Teaching: a Case Study