The data scientist profile and its representativeness in the European e-Competence framework and the skills framework for the information age

https://doi.org/10.1016/j.ijinfomgt.2017.07.010Get rights and content

Abstract

The activities in our current world are mainly supported by data-driven web applications, making extensive use of databases and data services. Such phenomenon led to the rise of Data Scientists as professionals of major relevance, which extract value from data and create state-of-the-art data artifacts that generate even more increased value. During the last years, the term Data Scientist attracted significant attention. Consequently, it is relevant to understand its origin, knowledge base and skills set, in order to adequately describe its profile and distinguish it from others like Business Analyst. This work proposes a conceptual model for the professional profile of a Data Scientist and evaluates the representativeness of this profile in two commonly recognized competences/skills frameworks in the field of Information and Communications Technology (ICT), namely in the European e-Competence (e-CF) framework and the Skills Framework for the Information Age (SFIA). The results indicate that a significant part of the knowledge base and skills set of Data Scientists are related with ICT competences/skills, including programming, machine learning and databases. The Data Scientist professional profile has an adequate representativeness in these two frameworks, but it is mainly seen as a multi-disciplinary profile, combining contributes from different areas, such as computer science, statistics and mathematics.

Introduction

Nowadays, we live in a world surrounded by data-driven web applications. The internet we know is built upon databases and data services. The role of a Data Scientist has never been so important, making possible the creation of data products, which acquire their value from the data itself and create more data as a result (Loukides, 2010), with even more increased value.

Data Science attracted a lot of attention during the last years (Fig. 1). Press (2013) argues that Data Science emerges as the coupling of Statistics and Computer Science. In fact, if we take a closer look at the most common related search in Trends (2017), we will find the term Computer Science. Besides, in 2009, Hal Varian, Chief Economist at Google, said that “the sexy job in the next ten years will be statisticians”, referring the relevance to understand data, to process it, to extract value from it, to visualize it and to communicate results. Consequently, it seems reasonable to assume that the knowledge base expected from a Data Scientist goes beyond the skills of a computer scientist (Cleveland, 2001), or a statistician (Cleveland, 2001, Warden, 2016), or even the coupling between these two.

The term Data Scientist as a professional profile brings us to 2008, when D.J. Patil and Jeff Hammerbacher met to share experiences about the development of the data and analytics groups at Facebook and LinkedIn, namely when they discussed what to call the people on their teams, since the respective organizations rapidly grew up (Davenport and Patil, 2012, Patil, 2011, Press, 2013). The Data Scientist profile was then established to designate someone who works on data applications that immediately and massively impact organizations (Patil, 2011), understanding how to find answers to relevant business questions, and exploring a voluminous and diverse set of data through a scientific way of doing things (Davenport & Patil, 2012).

However, when D.J. Patil and Jeff Hammerbacher met to share experiences, the term Data Scientist was not an immediate choice. Patil (2011) states: “Business Analyst seemed too limiting… Data Analyst was a contender, but we felt that title might limit what people could do… Research scientist was a reasonable job title used by companies… However, we felt that most research scientists worked on projects that were futuristic and abstract…”. The Data Science Association (DSA, 2016) also distinguishes Data Science from Business Analytics by defining the latter as “the practice of iterative, methodical exploration of an organization’s data with emphasis on statistical analysis”, instead of the inquisitive nature of Data Science. In order to conclude the discussion Data Scientist vs Data Analyst, it becomes easier to distinguish these two professional profiles if we look at the Data Scientist role as an evolution of the Data Analyst role (IBM, 2016), regarding the ability to have a strong business acumen and communicate findings in order to surpass business challenges, namely the ones whose solution brings the most value to the organization. According to (IBM, 2016), another factor that sets these two profiles apart is the data sources, since Data Scientists tend to explore multiple disparate sources, instead of one single source. The Data Scientists have an inquisitive nature, they ask questions, in search for understanding the meaning of that data.

As previously mentioned, the knowledge base expected from a Data Scientist exceeds the one of a Computer Scientist or the one of a Statistician, it may even exceed the knowledge base that results from the combination of both profiles. Therefore, in this work, we assume that the Data Scientist profile adequately fits under the Information Systems umbrella. We do not intend to claim that the Data Scientist profile is more suitable to Information Systems than to Computer Science, Statistics, or any other field, like Experimental Physics or Systems Biology, as Davenport and Patil (2012) also identified. We simply assume that the knowledge base and skills set needed in a Data Scientist are present in the Information Systems field and we aim to understand if (and how) the Data Scientist profile is represented in the e-CF and SFIA frameworks. We made our assumption based on two relevant aspects:

1. In the framework of Bacon and Fitzgerald (2001) for the field of Information Systems, it can be observed that this field concerns automation and leveraging of “information for knowledge work, customer satisfaction and business performance” (nature of data, information and knowledge; human-computer interface; information relevance, value and cost; data quality; organizational learning…) through the use of Information and Communication Technologies (ICT), such as software, databases, data warehouses, among others concepts, which are key elements in the knowledge base and skills set of a Data Scientist, as shown in the literature and discussed later in this document;

The education map made available by the Association for Information Systems (AIS) (AIS, 2016) illustrates Data Science as an emerging main concept among 514 programs of 336 institutions. (AIS, 2016) also highlights other closely related topics, such as big data, predictive analytics, data mining, distributed systems, databases, programming and decision support systems, for example.

Regarding the ICT competences/skills frameworks analyzed in this paper, e-CF (e-CF, 2016) is described as a reference of competences required and applied at the ICT workplace, while SFIA (SFIA, 2016) is aimed towards organizational design and talent management in Information Technology (IT). Despite the disparity in terminology (ICT vs IT), these frameworks aim to standardize competences in a broad field with a relatively ambiguous use of terms. Consequently, to test the hypothesis that the Data Scientist profile is adequately represented in these frameworks, the following method was used: among the scientific community, academic formations, job opportunities, professional associations and certifications in Data Science, we gathered the knowledge base (what a Data Scientist must know about) and the skills set (what a Data Scientist is able to do). Then, a conceptual model was proposed and we look into this knowledge base and skills set in e-CF and SFIA to evaluate if the profile is adequately represented.

Summarizing, the goal of this paper is the proposal of a conceptual model for the Data Scientist profile based on the analysis of information from the scientific community, academic formations, job opportunities, professional associations and certifications in Data Science, proposing the knowledge base (what a Data Scientist must know about) and the skills set (what a Data Scientist is able to do). This model can foster future research in this topic, and can help practitioners in recruitment campaigns for new Data Scientists or in the planning of new formations/certifications for current workers. Furthermore, this paper also aims to evaluate if e-CF and SFIA also take into consideration the Data Scientist profile as a representative profile within the ICT field of study and practice. We use the knowledge base and skills set identified in the conceptual model to evaluate if the Data Scientist profile is adequately represented in these frameworks.

This document is structured as follows: Section 2 describes related work; Section 3 summarizes the main knowledge base and skills set related to several academic programs, job opportunities, professional organizations and certifications in Data Science; Section 4 presents the proposed conceptual model; Section 5 discusses the representativeness of the Data Scientist profile in e-CF and SFIA; section 6 concludes with some remarks about the undertaken work.

Section snippets

Related work

Looking into the scientific literature, some authors already contributed to a cohesive understanding of the Data Scientist professional profile. Although this professional profile started to be emphasized somewhere in the end of the past decade (Patil, 2011), one of the first scientific publications describing what could possibly be a Data Scientist takes us as far as 2001, when Cleveland (2001) discussed the expanding of the field of statistics to embrace Data Science. Apart from that, a few

Academic background and professional profile

Since this work is focused in defining a conceptual model for the knowledge base and skills set of a Data Scientist, using the background knowledge already expressed by the scientific community, the next step consists in identifying the concepts that are frequently mentioned in the academy and industry related to Data Science.

Conceptual model for the data scientist profile

The main contribution of this work consists in the proposal of a conceptual model that describes the knowledge base and skills set that characterize a Data Scientist. This conceptual model is the result of the review of several scientific publications, academic background and professional-related publications, coupled with some of the terminology used in Association for Computing Machinery (ACM) 2012 classification system. It is relevant to highlight that the concepts in the ACM’s

Does the data scientist have an adequate representativeness in e-CF and SFIA?

In this section, the representativeness of the Data Scientist professional profile within e-CF 3.0 and SFIA is analyzed. Being the Data Scientist a multidisciplinary profile, it is not expected to be found concentrated in only one competence area. Consequently, this analysis considers every competence area, in order to search for examples of knowledge and skills present in a Data Scientist, having into consideration the focus on data, so, when programming is mentioned within its knowledge base,

Conclusion

This work proposed a conceptual model to describe the professional profile of a Data Scientist. The main findings extracted from the review of scientific literature, academic formations and industry-related content, contributed to the proposal of a conceptual model that aims to represent the Data Scientist profile. This model can be used to foster future research and to allow a common understanding regarding the Data Scientist professional profile, helping organizations in future recruitment

Funding

This work was supported by COMPETE: POCI-01-0145-FEDER-007043 and FCT − Fundação para a Ciência e Tecnologia, within the Project UID/CEC/00319/2013 (ALGORITMI). This work has also been funded by the SusCity project (MITP-TB/CS/0026/2013) and by Portugal Incentive System for Research and Technological Development, Project in co-promotion n° 002814/2015 (iFACTORY 2015-2018).

References (35)

  • J.Y. Kim et al.

    An empirical analysis of requirements for data scientists using online job postings

    International Journal of Software Engineering and Its Applications

    (2016)
  • M. Kim et al.

    The emerging role of data scientists on software development teams

    Presented at the Proceedings–International conference on software engineering

    (2016)
  • AIS

    AIS education map

    (2016)
  • C.J. Bacon et al.

    A systemic framework for the field of information systems

    SIGMIS Database

    (2001)
  • W.S. Cleveland

    Data science: An action plan for expanding the technical areas of the field of statistics

    International Statistical Review

    (2001)
  • Cloudera

    CCP data scientist

    (2016)
  • Columbia University

    Certification of professional achievement in data sciences

    (2016)
  • Columbia University

    Master of science in data science

    (2016)
  • C. Costa et al.

    A conceptual model for the professional profile of a data scientist

  • DSA

    Data science association homepage

    (2016)
  • T.H. Davenport et al.

    Data scientist

    Harvard Business Review

    (2012)
  • V. Dhar

    Data science and prediction

    Communications of the ACM

    (2013)
  • EMC

    Data science and big data analytics training and certification

    (2016)
  • EuADS

    European association for data science homepage

    (2016)
  • Facebook

    Data scientist, analytics (Instagram)

    (2016)
  • Google

    Google job opportunity on data science

    (2016)
  • IBM

    What is a data scientist?

    (2016)
  • Cited by (0)

    View full text