Abstract
This article discusses the key elements of the Data Science Technology course offered to postgraduate students enrolled in the Master of Data Science program. This course complements the existing curriculum by providing the skills to handle the Big Data platform and tools, in addition to data science activities. We tackle the discussion about this course based on three main requirements, which are related to the need to exploit the key skills from two dimensions, namely, Data Science and Big Data, and the need for a cluster-based computing platform and its accessibility. We address these requirements by presenting the course design and its assessments, the configuration of the computing platform, and the strategy to enable flexible accessibility. In terms of course design, the offered course contributes to several innovative elements and has covered multiple key areas of the data science body of knowledge and multiple quadrants of the job and skills matrix. In the case of the computing platform, a stable deployment of a Hadoop cluster with flexible accessibility, triggered by the pandemic situation, has been established. Furthermore, through our experience with the implementation of the cluster, it has shown the ability of the cluster to handle computing problems with a larger dataset than the one used for the semesters within the scope of the study. We also provide some reflections and highlight future improvements.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The datasets generated during and/or analyzed during the current study are not publicly available due to security reasons, but are available from the corresponding author on reasonable request.
Abbreviations
- AI:
-
- Artificial Intelligence
- BDA:
-
- Big Data Analytics
- BoK:
-
- Body of Knowledge
- DSA:
-
- Data Science and Analytics
- EDA:
-
- Exploratory Data Analysis
- HDFS:
-
- Hadoop File System
- IoT:
-
- Internet of Things
- MCO:
-
- Movement Control Order
- MDEC:
-
- Malaysia Digital Economy Corporation
- PC:
-
- Personal Computer
- UiTM:
-
- Universiti Teknologi MARA
- VM:
-
- Virtual Machine
- YARN:
-
- Yet Another Resource Negotiator
References
Adams, J. C. (2020). Creating a balanced data science program. In Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, Association for Computing Machinery, pp. 185-191.
Bart, A.C., Kafura, D., Shaffer, C. A., & Tilevich, E. (2018). Reconciling the promise and pragmatics of enhancing computing pedagogy with data science. In SIGCSE 2018 - Proceedings of the 49th ACM Technical Symposium on Computer Science Education, Association for Computing Machinery, Inc, vol 2018-January, pp 1029–1034.
Baumer, B. (2015). A data science course for undergraduates: Thinking with data. American Statistician, 69(4), 334–342.
Brunner, R. J., & Kim, E. J. (2016). Teaching data science. In Procedia computer science, elsevier b.v., (Vol. 80 pp. 1947–1956).
Çetinkaya-Rundel, M., & Ellison, V. (2020). A fresh look at introductory data science. Journal of Statistics Education, 2021(S1), 16–26.
Çetinkaya-Rundel, M., & Rundel, C. (2018). Infrastructure and tools for teaching computing throughout the statistical curriculum. American Statistician, 72(1), 58–65.
Cuadrado-Gallego, J. J., & Demchenko, Y. (2020). Data science body of knowledge. In J. J. Cuadrado-gallego Y. Demchenko (Eds.) The Data Science Framework: A View from the EDISON Project (pp. 43–73). Cham: Springer International Publishing.
Demchenko, Y., & Cuadrado-Gallego, J. J. (2020). Data science competences. In J. J. Cuadrado-gallego Y. Demchenko (Eds.) The Data Science Framework: A View from the EDISON Project (pp. 9–41). Cham: Springer International Publishing.
DePratti, R., Dancik, G. M., Lucci, F., & Sampson, R. D. (2017). Development of an introductory big data programming and concepts course. Journal of Computing Sciences in Colleges, 32(6), 175– 182.
Dichev, C., & Dicheva, D. (2017). Towards data science literacy. In Procedia computer science, elsevier b.v., (Vol. 108 pp. 2151–2160).
Dichev, C., Dicheva, D., Cassel, L., Goelman, D., & Posner, M. (2016). Preparing all students for the data-driven world. In Proceedings of the Symposium on Computing atMinority Institutions, ADMI.
Donoghue, T., Voytek, B., & Ellis, S. E. (2021). Teaching creative and practical data science at scale. Journal of Statistics and Data Science Education, 29(sup1), S27–S39.
Eckroth, J. (2016). Teaching big data with a virtual cluster. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education, Association for Computing Machinery, New York, NY, USA, SIGCSE ’16, pp 175–180.
Eckroth, J. (2017). Teaching future big data analysts: Curriculum and experience report. In Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Institute of Electrical and Electronics Engineers Inc., pp 346–351.
Eckroth, J. (2018). A course on big data analytics. J Parallel Distrib Comput, 118, 166–176.
Eilks, I. (2018). Action research in science education: a twenty-year personal perspective. ARISE, 1(1), 3–14.
Fekete, A., Kay, J., & Röhm, U. (2021). A data-centric computing curriculum for a data science major. In SIGCSE 2021 - Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, Association for Computing Machinery, Inc, pp 865–871.
Hicks, S. C., & Irizarry, R. A. (2018). A guide to teaching data science. American Statistician, 72(4), 382–391.
Kross, S., & Guo, P. J. (2019). Practitioners teaching data science in industry and academia: Expectations, workflows, and challenges. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, p 14.
Miller, S. (2017). The quant crunch: how the demand for data science skills is disrupting the job market.
Ngo, L. B., Duffy, E. B., & Apon, A. W. (2014). Teaching HDFS/MapReduce systems concepts to undergraduates. In Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS, IEEE Computer Society, pp 1114–1121.
Oudshoorn, M. J., Titus, K. J., & Suchan, W. K. (2020). Building a new data science program based on an existing computer science program. In Proceedings - Frontiers in Education Conference, FIE, Institute of Electrical and Electronics Engineers Inc., vol 2020-October.
Salloum, M., Jeske, D., Ma, W., Papalexakis, V., Shelton, C., Tsotras, V., Zhou, S., & Shelton, C. T. (2021). Developing an interdisciplinary data science program; developing an interdisciplinary data science program. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, ACM New York, NY, USA.
Shankar, A. C. (2021). MDEC’s commissioned study shows malaysia’s big data analytics market expected to grow to us$1.9b by 2025. https://www.theedgemarkets.com/article/mdecs-commissioned-study-shows-malaysias-big-data-analytics-market-expected-grow-us19b-2025.
Wiktorski, T., Demchenko, Y., & Cuadrado-Gallego, J. J. (2020). Data science curriculum. In J. J. Cuadrado-gallego Y. Demchenko (Eds.) The Data Science Framework: A View from the EDISON Project (pp. 75–108). Cham: Springer International Publishing.
Acknowledgements
We would like to take this opportunity to thank the School of Computing Sciences (formerly known as the Faculty of Computer and Mathematical Sciences), College of Computing, Informatics and Media, and Universiti Teknologi MARA (UiTM) for providing support to deploy the cluster for this course.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ismail, A., Mutalib, S. & Haron, H. Data science technology course: The design, assessment and computing environment perspectives. Educ Inf Technol 28, 10209–10234 (2023). https://doi.org/10.1007/s10639-022-11558-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-022-11558-8