Abstract
The development of data science curricula has gained attention in academia and industry. Yet, less is known about the pedagogical practices and tools employed in data science education. Through a systematic literature review, we summarize prior pedagogical practices and tools used in data science initiatives at the higher education level. Following the Technological Pedagogical Content Knowledge (TPACK) framework, we aim to characterize the technological and pedagogical knowledge quality of reviewed studies, as we find the content presented to be diverse and incomparable. TPACK is a universally established method for teaching considering information and communication technology. Yet it is seldom used for the analysis of data science pedagogy. To make this framework more structured, we list the tools employed in each reviewed study to summarize technological knowledge quality. We further examine whether each study follows the needs of the Cognitive Apprenticeship theory to summarize the pedagogical knowledge quality in each reviewed study. Of the 23 reviewed studies, 14 met the needs of Cognitive Apprenticeship theory and include hands-on experiences, promote students’ active learning, seeking guidance from the instructor as a coach, introduce students to the real-world industry demands of data and data scientists, and provide meaningful learning resources and feedback across various stages of their data science initiatives. While each study presents at least one tool to teach data science, we found the assessment of the technological knowledge of data science initiatives to be difficult. This is because the studies fall short of explaining how students come to learn the operation of tools and become proficient in using them throughout a course or program. Our review aims to highlight implications for practices and tools used in data science pedagogy for future research.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12102-y/MediaObjects/10639_2023_12102_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12102-y/MediaObjects/10639_2023_12102_Fig2_HTML.png)
Similar content being viewed by others
Data availability
Data sharing does not apply to this article as no datasets were generated or analyzed during the current study.
References
Akram, H., Yingxiu, Y., Al-Adwan, A. S., & Alkhalifah, A. (2021). Technology integration in higher education during COVID-19: An assessment of online teaching competencies through technological pedagogical content knowledge model. Frontiers in Psychology, 12, 736522.
Aktaş, İ, & Özmen, H. (2020). Investigating the impact of TPACK development course on pre-service science teachers’ performances. Asia Pacific Education Review, 21, 667–682.
Allaire JJ, Xie Y, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R (2021) Rmarkdown: Dynamic documents for R. https://CRAN.R-project.org/package=rmarkdown
Allen, G. I. (2021). Experiential learning in data science: Developing an interdisciplinary, client-sponsored capstone program. SIGCSE - Proc. ACM Tech. Symp. Comput. Sci. Educ., PG-516–522, 516–522. https://doi.org/10.1145/3408877.3432536
Anderson, P., Bowring, J., McCauley, R., Pothering, G., & Starr, C. (2014). n undergraduate degree in data science: curriculum and a decade of implementation experience. 45th ACM Technical Symposium on Computer Science Education, 145–150.
Archambault, L. M., & Barnett, J. H. (2010). Revisiting technological pedagogical content knowledge: Exploring the TPACK framework. Computers & Education, 44(4), 1656–1662.
Barman, A., Chen, S., Chang, A., & Allen, G. (2022). Experiential learning in data science through a novel client-facing consulting course. Proc. Front. Educ. Conf. FIE, 2022-Octob(PG-). https://doi.org/10.1109/FIE56618.2022.9962532
Bart, A. C., Kafura, D., Shaffer, C. A., & Tilevich, E. (2018). Reconciling the promise and pragmatics of enhancing computing pedagogy with data science. 49th ACM Technical Symposium on Computer Science Education, 1029–1034.
Berman, F., Rutenbar, R., Hailpern, B., Christensen, H., Davidson, S., Estrin, D., ..., & Szalay, A. S. (2018). Realizing the potential of data science. Communications of the ACM, 61(4), 67–72.
Bonnell, J., Ogihara, M., & Yesha, Y. (2022). Challenges and issues in data science education. Computer, 55(2 PG-63–66), 63–66. https://doi.org/10.1109/MC.2021.3128734
Bornn, L., Mortensen, J., & Ahrensmeier, D. (2022). A data-first approach to learning real-world statistical modeling. Canadian Journal for the Scholarship of Teaching and Learning, 13(1 PG-). https://doi.org/10.5206/cjsotlrcacea.2022.1.10204
Brinkley-Etzkorn, K. E. (2018). Learning to teach online: Measuring the influence of faculty development training on teaching effectiveness through a TPACK lens. The Internet and Higher Education, 38, 28–35.
Cao, L. (2017). Data science: A comprehensive overview. ACM Computing Surveys (CSUR), 50(3), 1–42.
Cetinkaya-Rundel, M., & Ellison, V. (2021). A fresh look at introductory data science. Journal of Statistics and Data Science Education, 29(PG-S16-S26), S16–S26. https://doi.org/10.1080/10691898.2020.1804497
Ching, G. S., & Roberts, A. (2020). Evaluating the pedagogy of technology integrated teaching and learning: An overview. International Journal of Research Studies in Education, 9, 37–50.
Collins, A., Brown, J. S., & Holum, A. (1991). Cognitive apprenticeship: Making thinking visible. American Educator, 15(3), 6–11.
Collins, A., Brown, J. S., & Newman, S. E. (2018). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In Knowing, learning, and instruction. Routledge.
Collins, A. (2006). Cognitive apprenticeship. The cambridge handbook of the learning sciences.
Covidence. (2023). Covidence systematic review software. Retrieved February 2023 from www.covidence.org
Danyluk, A., Leidig, P., McGettrick, A., Cassel, L., Doyle, M., Servin, C., Schmitt, K., & Stefik, A. (2021). Computing competencies for undergraduate data science programs: An ACM task force final report. SIGCSE, PG-1119–1120, 1119–1120. https://doi.org/10.1145/3408877.3432586
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., …, & Ye, P. (2017). Curriculum guidelines for undergraduate programs in data science. In Annual Review of Statistics and Its Application (Vol. 4, Issue PG-15–30, pp. 15–30). https://doi.org/10.1146/annurev-statistics-060116-053930
Dennen, V. P., & Burner, K. J. (2008). The cognitive apprenticeship model in educational practice. Routledge.
Dogan, A., & Birant, D. (2021). Machine learning and data mining in manufacturing. Expert Systems with Applications, 166, 114060.
Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education, 29(PG-S27-S39), S27–S39. https://doi.org/10.1080/10691898.2020.1860725
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766.
Fennell, H. W., Lyon, J. A., Madamanchi, A., & Magana, A. J. (2020). Toward computational apprenticeship: Bringing a constructivist agenda to computational pedagogy. Journal of Engineering Education, 109(2), 170–176.
Feyyad, U. M. (1996). Data mining and knowledge discovery: Making sense out of data. IEEE Expert, 11(5), 20–25.
Finzer, W. (2013). The data science education dilemma. Technology Innovations in Statistics Education, 7(2). https://doi.org/10.52041/srap.12105
Garrett, K. N. (2014). A quantitative study of higher education faculty self-assessments of technological, pedagogical, and content knowledge (TPaCK) and technology training. The University of Alabama.
Gess-Newsome, J. (1999). Pedagogical content knowledge: An introduction and orientation. In Examining pedagogical content knowledge: The construct and its implications for science education (pp. 3–17).
Green, A., & Zhai, C. (2019). LiveDataLab: A cloud-based platform to facilitate hands-on data science education at scale. In Proceedings of the Sixth (2019) ACM Conference on Learning@ Scale (Issue PG-, pp. 1–2). https://doi.org/10.1145/3330430.3333665
Hassan, O. A. (2011). Learning theories and assessment methodologies–an engineering educational perspective. European Journal of Engineering Education, 36(4), 327–339.
Hee, K., Zicari, R. V., Tolle, K., & Manieri, A. (2016). Tailored data science education using gamification. In 2016 8TH IEEE International Conference on Cloud Computing Technology and Science (CLOUDCOM 2016) (Issue PG-627–632, pp. 627–632). https://doi.org/10.1109/CloudCom.2016.105
Hicks, S. C., & Irizarry, R. A. (2018). A guide to teaching data science. The American Statistician, 72(4 PG-382–391), 382–391. https://doi.org/10.1080/00031305.2017.1356747
Holt, D., Smissen, I., & Segrave, S. (2006). New students, new learning, new environments in higher education: Literacies in the digital age. Proceedings of the 23rd Annual ASCILITE Conference “Who’s Learning? Whose Technology, 327–336.
Hughes, J., Thomas, R., & Scharber, C. (2006). Assessing technology integration: The RAT–replacement, amplification, and transformation-framework. In Society for Information. Technology & Teacher Education International Conference, 1616–1620.
Huppenkothen, D., Arendt, A., Hogg, D. W., Ram, K., VanderPlas, J. T., & Rokem, A. (2018). Hack weeks as a model for data science education and collaboration. Proceedings of the National Academy of Sciences of the United States of America, 115(36 PG-8872–8877), 8872–8877. https://doi.org/10.1073/pnas.1717196115
Ionascu, A., & Stefaniga, S. A. (2020). DS Lab Notebook: A new tool for data science applications. In 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2020) (Issue PG-310–314, pp. 310–314). https://doi.org/10.1109/SYNASC51798.2020.00056
Irizarry, R. A. (2020). The role of academia in data science education. 2(1).
Kim, B., & Henke, G. (2021). Easy-to-use cloud computing for teaching data science. Journal of Statistics and Data Science Education, 29(PG-S103-S111), S103–S111. https://doi.org/10.1080/10691898.2020.1860726
Kitchin, R. (2014). Big data, new epistemologies and paradigm shifts. Big Data & Society, 1(1). https://doi.org/10.1177/2053951714528481
Koyuncuoglu, Ö. (2021). An investigation of graduate students’ Technological Pedagogical and Content Knowledge (TPACK). International Journal of Education in Mathematics, Science and Technology, 9(2), 299–313.
Kristensen, F., Troeng, O., Safavi, M., & Narayanan, P. (2015). Competition in higher education–good or bad.
Kross, S., & Guo, P. J. (2019). Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–14.
Maksimenkova, O., Neznanov, A., & Radchenko, I. (2019). Using data expedition as a formative assessment tool in data science education: Reasoning, justification, and evaluation. International Journal of Emerging Technologies in Learning, 14(11 PG-107–122), 107–122. https://doi.org/10.3991/ijet.v14i11.10202
Maksimenkova, O., Neznanov, A., & Radchenko, I. (2020). Collaborative learning in data Science education: A data expedition as a formative assessment tool. In Challenges of the Digital Transformation in Education, ICL2018, VOL 1 (Vol. 916, Issue PG-14–25, pp. 14–25). https://doi.org/10.1007/978-3-030-11932-4_2
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.
Mikalef, P., & Krogstie, J. (2019). Investigating the Data Science Skill Gap: An Empirical Analysis. In EDUCON (Issue PG-1275–1284, pp. 1275–1284).
Mikroyannidis, A., Domingue, J., Bachler, M., & Quick, K. (2019). Smart blockchain badges for data science education. Proc. Front. Educ. Conf. FIE, 2018-Octob(PG-). https://doi.org/10.1109/FIE.2018.8659012
Mikroyannidis, A., Domingue, J., Phethean, C., Beeston, G., & Simperl, E. (2018). Designing and delivering a curriculum for data science education across Europe. In Teaching and Learning in a Digital World (Vol. 716, Issue PG-540–550, pp. 540–550). https://doi.org/10.1007/978-3-319-73204-6_59
Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. Teachers College Record, 108(6), 1017–1054.
Molenda, M. (2003). In search of the elusive ADDIE model. Performance Improvement, 42(5), 34–37.
Mujallid, A. (2021). Instructors’ readiness to teach online: A review of TPACK standards in online professional development. Programmes in Higher Education. International Journal of Learning, Teaching and Educational Research, 20(7), 135–150.
Murray, S., Ryan, J., & Pahl, C. (2003). A tool-mediated cognitive apprenticeship approach for a computer engineering course. 3rd IEEE International Conference on Advanced Technologies, 2–6.
Polak, J., & Cook, D. (2021). A study on student performance, engagement, and experience with Kaggle InClass data challenges. Journal of Statistics and Data Science Education, 29(1 PG-63–70), 63–70. https://doi.org/10.1080/10691898.2021.1892554
Power, D. J. (2016). Data science: Supporting decision-making. Journal of Decision Systems, 25(4), 345–356.
Rao, A., Bihani, A., & Nair, M. (2018). Milo: A visual programming environment for Data Science Education. In 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) (Issue PG-211–215, pp. 211–215). NS -
Romrell, D., Kidder, L., & Wood, E. (2014). The SAMR model as a framework for evaluating mLearning. Online Learning Journal, 18(2). https://doi.org/10.24059/olj.v18i2.435
Rossi, R. (2021). Data science education based on ADDIE model and the EDISON framework. In 2021 International Conference on Big Data Engineering and Education (BDEE 2021) (Issue PG-40–45, pp. 40–45). https://doi.org/10.1109/BDEE52938.2021.00013
Rostami, M. A., & Bucker, H. M. (2019). Redesigning interactive educational modules for combinatorial scientific computing. In Computational Science - ICCS 2019, PT V (Vol. 11540, Issue PG-363–373, pp. 363–373). https://doi.org/10.1007/978-3-030-22750-0_29
Roy, P. K., Saumya, S., Singh, J. P., Banerjee, S., & Gutub, A. (2023). Analysis of community question-answering issues via machine learning and deep learning: State-of-the-art review. CAAI Transactions on Intelligence Technology, 8(1), 95–117.
Salas-Rueda, R. A. (2020). TPACK: Technological, pedagogical and content model necessary to improve the educational process on mathematics through a web application? International Electronic Journal of Mathematics Education, 15(1). https://doi.org/10.29333/iejme/5887
Sanchez-Pinto, L. N., Luo, Y., & Churpek, M. M. (2018). Big data and data science in critical care. Chest, 154(5), 1239–1248.
Sánchez‐Peña, M., Vieira, C., & Magana, A. J. (2022). Data science knowledge integration: Affordances of a computational cognitive apprenticeship on student conceptual understanding. Computer Applications in Engineering Education, 31(2), 239–259. https://doi.org/10.1002/cae.22580
Savonen, C., Wright, C., Hoffman, A. M., Muschelli, J., Cox, K., Tan, F. J., & Leek, J. T. (2022). Open-source Tools for Training Resources–OTTR. Journal of Statistics and Data Science Education, PG- 1–12. https://doi.org/10.1080/26939169.2022.2118646
Schmidt, D. A., Baran, E., Thompson, A. D., Mishra, P., Koehler, M. J., & Shin, T. S. (2009). Technological pedagogical content knowledge (TPACK) the development and validation of an assessment instrument for preservice teachers. Journal of Research on Technology in Education, 42(2), 123–149.
Shafi, A., Saeed, S., Bamarouf, Y. A., Iqbal, S. Z., Min-Allah, N., & Alqahtani, M. A. (2019). Student outcomes assessment methodology for ABET accreditation: A case study of computer science and computer information systems programs. IEEE Access, 7, 13653–13667.
Sheffield, R., Dobozy, E., Gibson, D., Mullaney, J., & Campbell, C. (2015). Teacher education students using TPACK in science: A case study. Educational Media International, 52(3), 227–238.
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14.
Silva, P. (2015). Davis’ technology acceptance model (TAM)(1989). Information Seeking Behavior and Technology Adoption: Theories and Trends (pp. 205–219). https://doi.org/10.4018/978-1-4666-8156-9.ch013
Song, I. Y., & Zhu, Y. J. (2016). Big data and data science: what should we teach? Expert Systems, 33(4 PG-364–373), 364–373. https://doi.org/10.1111/exsy.12130
Suthar, K., Mitchell, T., Hartwig, A. C., Wang, J., Mao, S., Parson, L., Zeng, P., Liu, B., & He, P. (2021). Real data and application-based interactive modules for data science education in engineering. ASEE Annu. Conf. Expos. Conf. Proc., PG-. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85124546523&partnerID=40&md5=ed00569a6049c4f397399743b6de40efNS-
Tang, R., & Sae-Lim, W. (2016). Data science programs in US higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information, 23(3), 269–290.
Vance, E. A. (2021). Using team-based learning to teach data science. Journal of Statistics and Data Science Education, 29(3 PG-277–296), 277–296. https://doi.org/10.1080/26939169.2021.1971587
Watson, D. M. (2001). Pedagogy before technology: Re-thinking the relationship between ICT and teaching. Education and Information Technologies, 6, 251–266.
West, J. (2018). Teaching data science: an objective approach to curriculum validation. Computer Science Education, 28(2 PG-136–157), 136–157. https://doi.org/10.1080/08993408.2018.1486120
Yavuz, F. G., & Ward, M. D. (2020). Fostering undergraduate data science. American Statistician, 74(1 PG-8–16), 8–16. https://doi.org/10.1080/00031305.2017.1407360
Acknowledgements
This study was funded by Canada Research Chair Program and Canada Foundation for Innovation
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following financial interests/personal relationships which may be considered as potential competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Memarian, B., Doleck, T. Data science pedagogical tools and practices: A systematic literature review. Educ Inf Technol 29, 8179–8201 (2024). https://doi.org/10.1007/s10639-023-12102-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-023-12102-y