skip to main content
10.1145/3344429.3372503acmconferencesArticle/Chapter ViewAbstractPublication PagesiticseConference Proceedingsconference-collections
research-article

An Empirical Approach to Understanding Data Science and Engineering Education

Published: 18 December 2019 Publication History

Abstract

As data science is an evolving field, existing definitions reflect this uncertainty with overloaded terms and inconsistency. As a result of the field's fluidity, there is often a mismatch between what data-related programs teach, what employers expect, and the actual tasks data scientists are performing. In addition, the tools available to data scientists are not necessarily the tools being taught; textbooks do not seem to meet curricular needs; and empirical evidence does not seem to support existing program design. Currently, the field appears to be bifurcating into data science (DS) and data engineering (DE), with specific but overlapping roles in the combined data science and engineering (DSE) lifecycle. However, curriculum design has not yet caught up to this evolution. This working group report shows an empirical and data-driven view of the data-related education landscape, and includes several recommendations for both academia and industry that are based on this analysis.

References

[1]
ABET, Inc. 2017. Criteria for Accrediting Computing Programs. Effective for Review During the 2017--18 Accreditation Cycle. http://www.abet.org/wp-content/uploads/2016/12/C001--17--18-CAC-Criteria-10--29--16--1.pdf
[2]
Accenture Labs. 2016. Building Digital Trust: The Role of Data Ethics in the Digital Age. https://www.accenture.com/us-en/insight-data-ethics . Accessed: 2019-06--29.
[3]
Saeed Aghabozorgi. 2016. Data Scientist vs Data Engineer, What's the Difference? https://cognitiveclass.ai/blog/data-scientist-vs-data-engineer/. Accessed: 2019-06--15.
[4]
Ethem Alpaydin. 2014. Introduction to Machine Learning, Third Edition .The MIT Press, Cambridge, MA.
[5]
American Statistical Association. 2018. Ethical Guidelines for Statistical Practice. https://www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for-Statistical-Practice.aspx . Accessed: 2019-06--29.
[6]
Data Science Association. 2013. Data Science Code of Professional Conduct. https://www.datascienceassn.org/code-of-conduct.html . Accessed: 2019-06--29.
[7]
Association for Computing Machinery. 2018. ACM Code of Ethics and Professional Conduct . https://www.acm.org/code-of-ethics . Accessed: 2019-06--29.
[8]
Ulrich Bayer, Paolo Milani Comparetti, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. 2009. Scalable, behavior-based malware clustering. In Network and Distributed Systems Security Symposium, Vol. 9. Citeseer, Internet Society, San Diego, 8--11.
[9]
Casey Bennett and Thomas Doub. 2011. Data Mining and Electronic Health Records: Selecting Optimal Clinical Treatments in Practice. In IEEE International Conference on Data Mining (ICDM). IEEE, Vancouver, Canada, 313--318.
[10]
Francine Berman, Rob Rutenbar, Henrik Christensen, Susan Davidson, Deborah Estrin, Michael Franklin, Brent Hailpern, Margaret Martonosi, Padma Raghavan, Victoria Stodden, and Alex Szalay. 2016. Realizing the Potential of Data Science: Final Report from the National Science Foundation Computer and Information Science and Engineering Advisory Committee Data Science Working Group. https://www.nsf.gov/cise/ac-data-science-report/CISEACDataScienceReport1.19.17.pdf .
[11]
Verónica Bolón-Canedo, Noelia Sánchez-Marono, Amparo Alonso-Betanzos, José Manuel Ben'itez, and Francisco Herrera. 2014. A review of microarray datasets and applied feature selection methods. Information Sciences, Vol. 282 (2014), 111--135.
[12]
Robert J. Brunner and Edward J. Kim. 2016. Teaching data science. Procedia Computer Science, Vol. 80 (2016), 1947--1956.
[13]
Emanuelle Burton, Judy Goldsmith, and Nicholas Mattei. 2018. How to teach computer ethics through science fiction. Commun. ACM, Vol. 61, 8 (2018), 54--64.
[14]
Alvaro A. Cárdenas, Pratyusa K. Manadhata, and Sreeranga P. Rajan. 2013. Big data analytics for security. IEEE Security & Privacy, Vol. 11, 6 (2013), 74--76.
[15]
Ruth C. Carlos, Jr Charles E Kahn, and Safwan S Halabi. 2018. Data Science: Big Data, Machine Learning, and Artificial Intelligence. Journal of the American College of Radiology, Vol. 15 (03 2018), 497--498. https://doi.org/10.1016/j.jacr.2018.01.029
[16]
Malcolm Carr, Miles Barker, Beverley Bell, Fred Biddulph, Alister Jones, Valda Kirkwood, John Pearson, and David Symington. 2013. The Constructivist Paradigm and Some Implications for Science Content and Pedagogy. Taylor and Francis Group, London, Chapter The Content of Science: A Constructivist Approach to its Teaching and Learning, 159--172.
[17]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, Vol. 16 (2002), 321--357.
[18]
Cleverism. 2019. Objectives and Responsibilities of the Data Engineer. https://www.cleverism.com/job-profiles/data-engineer/. Accessed: 2019-09-05.
[19]
Data Science Community. 2017. College and University Data Science Degrees. http://datascience.community/colleges . Accessed: 2019-07--16.
[20]
Coursera. 2019. Data Science Specialization. https://www.coursera.org/specializations/jhu-data-science? . Accessed: 2019-07--14.
[21]
Andrea Danyluk, Paul Leidig, Lillian Cassel, and Christian Servin. 2019. ACM Task Force on Data Science: Draft Report and Opportunity for Feedback. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (SIGCSE '19). ACM, New York, NY, USA, 2.
[22]
Data4Democracy. 2019. Ethics Resources. https://github.com/Data4Democracy/ethics-resources . Accessed: 2019-06--29.
[23]
DataEthics. 2017. Data Ethics Principles. https://dataethics.eu/data-ethics-principles/ . Accessed: 2019-06--29.
[24]
Thomas H Davenport and D J Patil. 2012. Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Vol. 90, 5 (2012), 70--76.
[25]
Richard D De Veaux, Mahesh Agarwal, Maia Averett, Benjamin S Baumer, Andrew Bray, Thomas C Bressoud, Lance Bryant, Lei Z Cheng, Amanda Francis, Robert Gould, et almbox. 2017. Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Application, Vol. 4 (2017), 15--30.
[26]
Yuri Demchenko, Adam Belloum, Wouter Los, Tomasz Wiktorski, Andrea Manieri, Holger Brocks, Jana Becker, Dominic Heutelbeck, Matthias Hemmje, and Steve Brewer. 2016. EDISON data science framework: a foundation for building data science profession for research and industry. In 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, Luxembourg, 620--626.
[27]
K. Eric Drexler. 2013. The difference between science and engineering. https://fs.blog/2013/07/the-difference-between-science-and-engineering/ . Accessed: 2019-09-09.
[28]
Alain Dupuy and Richard M Simon. 2007. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal of the National Cancer Institute, Vol. 99, 2 (2007), 147--157.
[29]
M C Elish and Danah Boyd. 2018. Situating Methods in the Magic of Big Data and AI . Communication Monographs, Vol. 85 (2018), 57--80.
[30]
Hugo Jair Escalante, Thamar Solorio, and Manuel Montes-y Gómez. 2011. Local histograms of character n-grams for authorship attribution. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies--Volume 1. Association for Computational Linguistics, Portland, OR, 288--298.
[31]
Casey Fiesler. 2018. Tech Ethics Curricula: A Collection of Syllabi . https://medium.com/@cfiesler/tech-ethics-curricula-a-collection-of-syllabi-3eedfb76be18.
[32]
Principles for Digital Development. 2017. Principles. https://digitalprinciples.org/principles/ . Accessed: 2019-06--29.
[33]
Alan Fritzler. 2018. An Ethical Checklist for Data Science. https://dssg.uchicago.edu/2015/09/18/an-ethical-checklist-for-data-science/. Accessed: 2019-06--29.
[34]
Vincent Granville. 2017. Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics. https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning . Accessed: 2019-06--18.
[35]
Hillary Green-Lerman. 2018. What is Data Engineering? https://www.datacamp.com/community/blog/data-engineering . Accessed: 2019-09-05.
[36]
Peter Grindrod and Juan Bernabe Moreno. 2018. Code of Conduct for Professional Data Scientists. http://www.code-of-ethics.org/ . Accessed: 2019-06--29.
[37]
Joel Grus. 2015. Data Science from Scratch: First Principles with Python .O'Relly Media, Sebastopol, CA. http://my.safaribooksonline.com/97814919-01427
[38]
J. Hardin, R. Hoerl, Nicholas J. Horton, D. Nolan, B. Baumer, O. Hall-Holt, P. Murrell, R. Peng, P. Roback, D. Temple Lang, and M. D. Ward. 2015. Data Science in Statistics Curricula: Preparing Students to "Think with Data". The American Statistician, Vol. 69, 4 (2015), 343--353.
[39]
Stephanie C Hicks and Rafael A Irizarry. 2018. A guide to teaching data science. The American Statistician, Vol. 72, 4 (2018), 382--391.
[40]
Larry O. Natt Gantt II. 2007. Deconstructing Thinking Like a Lawyer: Analyzing the Cognitive Components of the Analytical Mind . Campbell Law Review, Vol. 29 (2007), 413.
[41]
International Data Science in Schools Project. 2019. IDSSP: the International Data Science in Schools Project: Abbreviated Topics List. http://www.idssp.org/files/IDSSP_DraftFramework_AbbreviatedLists.pdf . Draft Curriculum Framework.
[42]
Indeed. 2019. https://www.indeed.co.uk) .
[43]
JetBrains. 2018. Data Science Survey . https://www.jetbrains.com/research/data-science-2018/. Accessed: 2019-07--14.
[44]
Kaggle. 2017. Kaggle Machine Learning & Data Science Survey 2017. https://www.kaggle.com/kaggle/kaggle-survey-2017 . Accessed: 2019-07--14.
[45]
Daniel Kaplan. 2018. Teaching Stats for Data Science. The American Statistician, Vol. 72, 1 (2018), 89--96.
[46]
Vlado Kevs elj, Evangelos Milios, Andrew Tuttle, Singer Wang, and Roger Zhang. 2006. DalTREC 2005 Spam Track: Spam Filtering using N-gram-based Techniques.
[47]
Oleksii Kharkovyna. 2019. Who Is a Data Engineer & How to Become a Data Engineer? https://towardsdatascience.com/who-is-a-data-engineer-how-to-become-a-data-engineer-1167ddc12811 . Accessed: 2019-09-09.
[48]
J Zico Kolter and Marcus A Maloof. 2006. Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research, Vol. 7, Dec (2006), 2721--2744.
[49]
Cassie Korzyrkov. 2018. Top 10 roles in AI and data science. https://www.kdnuggets.com/2018/08/top-10-roles-ai-data-science.html . Accessed: 2019-06--18.
[50]
Jess M Krannich, James R Holbrook, and Julie J McAdams. 2008. Beyond thinking like a lawyer and the traditional legal paradigm: Toward a comprehensive view of legal education. Denv. UL Rev., Vol. 86 (2008), 381.
[51]
Maarten Lambers and Cor J Veenman. 2009. Forensic authorship attribution using compression distances to prototypes. In International Workshop on Computational Forensics. Springer, The Hague, The Netherlands, 13--24.
[52]
Wenke Lee and Salvatore J. Stolfo. 1998. Data Mining Approaches for Intrusion Detection. In Proceedings of the 7th Conference on USENIX Security Symposium - Volume 7 (SSYM'98). USENIX Association, Berkeley, CA, USA, 6--6. http://dl.acm.org/citation.cfm?id=1267549.1267555
[53]
Yaniv Leven. 2017. Data Engineer Vs Data Scientist. https://blog.panoply.io/what-is-the-difference-between-a-data-engineer-and-a-data-scientist . Accessed: 2010-06--15.
[54]
Adam Loy, Shonda Kuiper, and Laura Chihara. 2019. Supporting Data Science in the Statistics Curriculum. Journal of Statistics Education, Vol. 27, 1 (2019), 2--11.
[55]
George A Miller. 1998. WordNet: An electronic lexical database .MIT Press, Cambridge, MA.
[56]
Robert Muenchen. 2019. The Popularity of Data Science Software. http://r4stats.com/articles/popularity/. Accessed: 2019-04-04.
[57]
Engineering National Academies of Sciences and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options .The National Academies Press, Washington, DC. https://doi.org/10.17226/25104
[58]
Institute of Electrical and Electronics Engineers. 2014. IEEE Code of Ethics. https://www.ieee.org/about/corporate/governance/p7--8.html . Accessed: 2019-06--29.
[59]
Jean Piaget. 1954. The Construction of Reality in the Child. Basic Books, New York.
[60]
The Linux Foundation Projects. 2019. Data Values and Principles. https://datapractices.org/manifesto/ . Accessed: 2019-06--29.
[61]
Rajendra K. Raj, Allen Parrish, John Impagliazzo, Carol J. Romanowski, Sherif G. Aly, Casey C. Bennett, Karen C. Davis, Andrew McGettrick, Teresa Susana Mendes Pereira, and Lovisa Sundin. 2019. A Listing of Data Science and Engineering (DSE) Textbooks, circa July 2019 . Technical Report. The American University in Cairo. https://bit.ly/2lUhQJh.
[62]
David Reinsel, John Gantz, and John Rydning. 2018. The Digitization of the World from Edge to Core. https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf . Accessed: 2019--11--11.
[63]
Karl Rexer. 2017. A Decade of Surveying Analytic Professionals: 2017 Survey Highlights . http://www2.cs.uh.edu/ ceick/UDM/RexerDSSOV17.pdf . Accessed: 2019-07--14.
[64]
Virginia Richardson. 2003. Constructivist pedagogy. Teachers college record, Vol. 105, 9 (2003), 1623--1640.
[65]
Margaret Rouse. 2016. Definition: data engineer. https://searchdatamanagement.techtarget.com/definition/data-engineer . Accessed: 2019-09-05.
[66]
Margaret Rouse. 2019. Data Analytics: definition. https://searchdatamanagement.techtarget.com/definition/data-analytics . Accessed: 2019-09-09.
[67]
Aravind Sekar. 2018. What Is The Difference Between Data Science And Machine Learning? https://analyticstraining.com/what-is-the-difference-between-data-science-and-machine-learning . Accessed: 2019-05--25.
[68]
Russell Shackelford, Andrew McGettrick, Robert Sloan, Heikki Topi, Gordon Davies, Reza Kamali, James Cross, John Impagliazzo, Richard LeBlanc, and Barry Lunt. 2006. Computing curricula 2005: The overview report. ACM SIGCSE Bulletin, Vol. 38, 1 (2006), 456--457.
[69]
Martin A Simon. 1993. Reconstructing mathematics pedagogy from a constructivist perspective. ERIC, New York.
[70]
Stack Exchange. 2019. Cross Validated . https://stats.stackexchange.com/.
[71]
Studyportals B.V. 2019. Studyportals . https://www.studyportals.com/.
[72]
Brian Suda. 2018. 2017 Data Science Salary Survey. https://learning.oreilly.com/library/view/2017-data-science/9781491997079/ch04.html . Accessed: 2019-07--14.
[73]
Andrew H Sung and Srinivas Mukkamala. 2003. Identifying important features for intrusion detection using support vector machines and neural networks. In Proceedings of the 2003 Symposium on Applications and the Internet. IEEE, Orlando, FL, 209--216.
[74]
techopedia. 2019 a. Definition - What does Data Engineer mean? https://www.techopedia.com/definition/33707/data-engineer . Accessed: 2019-09-09.
[75]
techopedia. 2019 b. Definition - What is Data Science? https://www.techopedia.com/definition/30202/data-science . Accessed: 2019-09-09.
[76]
Rochelle Tractenberg, Kevin FitzGerald, and Jeff Collmann. 2016. Evidence of sustainable learning from the mastery rubric for ethical reasoning. Education Sciences, Vol. 7, 1 (2016), 2.
[77]
Rochelle E. Tractenberg. 2016. Institutionalizing Ethical Reasoning: Integrating the ASA's Ethical Guidelines for Professional Practice into Course, Program, and Curriculum. In Ethical Reasoning in Big Data: An Exploratory Analysis, Jeff Collmann and Sorin Adam Matei (Eds.). Springer, Berlin.
[78]
Rochelle E Tractenberg and Kevin T FitzGerald. 2012. A Mastery Rubric for the design and evaluation of an institutional curriculum in the responsible conduct of research. Assessment & Evaluation in Higher Education, Vol. 37, 8 (2012), 1003--1021.
[79]
John W. Tukey. 1962. The Future of Data Analysis. The Annals of Mathematical Statistics, Vol. 33, 1 (03 1962), 1--67. https://doi.org/10.1214/aoms/1177704711
[80]
Nanyang Technological University. 2018. Bachelor of Science in Data Science and Artificial Intelligence . Brochure. http://scse.ntu.edu.sg/Programmes/CurrentStudents/Undergraduate/Documents/2018/DSAIbrochure.pdf
[81]
Iliya Valchanov. 2018. Data Science versus Machine Learning versus Data Analytics versus Business Analytics. https://www.kdnuggets.com/2018/05/data-science-machine-learning-business-analytics.html . Accessed: 2019-06-01.
[82]
Rakesh Verma, Murat Kantarcioglu, David Marchette, Ernst Leiss, and Thamar Solorio. 2015. Security analytics: essential data analytics knowledge for cybersecurity professionals and students. IEEE Security & Privacy, Vol. 13, 6 (2015), 60--65.
[83]
Rakesh Verma, Narasimha Shashidhar, and Nabil Hossain. 2012. Phishing email detection the natural language way. In Computer Security -- ESORICS 2012, Sara Foresti, Moti Yung, and Fabio Martinelli (Eds.). Springer, Berlin, Heidelberg, 824--841.
[84]
Lev Semenovich Vygotsky. 1980. Mind in society: The development of higher psychological processes .Harvard University Press, Cambridge, MA.
[85]
Jeannette M Wing, Vandana P Janeja, Tyler Kloefkorn, and Lucy C Erickson. 2018. Data Science Leadership Summit: Summary Report.
[86]
Ge Yu. 2019. The Core Courses of Data Science and Big Data Technology: The Computing in Data Science. In ACM TURC 2019 (SIGCSE China). ACM, Chengdu, China, 1.

Cited By

View all
  • (2024)AI Technologies in Engineering EducationAI-Enhanced Teaching Methods10.4018/979-8-3693-2728-9.ch003(61-87)Online publication date: 14-Jun-2024
  • (2024)Curriculum Analysis for Data Systems EducationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659529(761-762)Online publication date: 8-Jul-2024
  • (2024)Seeing the Whole Elephant - A Comprehensive Framework for Data EducationProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630922(248-254)Online publication date: 7-Mar-2024
  • Show More Cited By

Index Terms

  1. An Empirical Approach to Understanding Data Science and Engineering Education

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ITiCSE-WGR '19: Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education
      December 2019
      218 pages
      ISBN:9781450375672
      DOI:10.1145/3344429
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 December 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. accreditation
      2. data engineering education
      3. data science education
      4. global standards
      5. iticse 2019 working group
      6. multidisciplinary education

      Qualifiers

      • Research-article

      Funding Sources

      • FCT ð Fundação para a Ciência e Tecnologia
      • US National Science Foundation

      Conference

      ITiCSE '19
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 552 of 1,613 submissions, 34%

      Upcoming Conference

      ITiCSE '25
      Innovation and Technology in Computer Science Education
      June 27 - July 2, 2025
      Nijmegen , Netherlands

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 25 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)AI Technologies in Engineering EducationAI-Enhanced Teaching Methods10.4018/979-8-3693-2728-9.ch003(61-87)Online publication date: 14-Jun-2024
      • (2024)Curriculum Analysis for Data Systems EducationProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 210.1145/3649405.3659529(761-762)Online publication date: 8-Jul-2024
      • (2024)Seeing the Whole Elephant - A Comprehensive Framework for Data EducationProceedings of the 55th ACM Technical Symposium on Computer Science Education V. 110.1145/3626252.3630922(248-254)Online publication date: 7-Mar-2024
      • (2024)Data Science Analysis Curricula: Academy Offers vs Professionals Learning Needs in Costa RicaInformation Systems and Technologies10.1007/978-3-031-45645-9_35(366-375)Online publication date: 14-Feb-2024
      • (2023)Enabling Machine Learning in Software Architecture Frameworks2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)10.1109/CAIN58948.2023.00021(92-93)Online publication date: May-2023
      • (2022)Oaths and the ethics of automated data: limits to porting the Hippocratic oath from medicine to data scienceCultural Studies10.1080/09502386.2022.204257737:1(168-189)Online publication date: 23-Feb-2022
      • (2020)High Performance Computing EducationProceedings of the Working Group Reports on Innovation and Technology in Computer Science Education10.1145/3437800.3439203(51-74)Online publication date: 17-Jun-2020
      • (2020)Toward High Performance Computing EducationProceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education10.1145/3341525.3394989(504-505)Online publication date: 15-Jun-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media