What we talk about when we talk about (big) data

doi:10.1016/j.jsis.2018.10.005

The Journal of Strategic Information Systems

Volume 28, Issue 1, March 2019, Pages 3-16

https://doi.org/10.1016/j.jsis.2018.10.005 Get rights and content

Abstract

In common with much contemporary discourse around big data, recent discussion of datafication in the Journal of Strategic Information Systems has focused on its effects on individuals, organisations and society. Generally missing from such analysis, however, is any consideration of data themselves. What is it that is having these effects? In this Viewpoint article I therefore present a critical analysis of a number of widely-held assumptions about data in general and big data in particular. Rather than being a referential, natural, foundational, objective and equal representation of the world, it will be argued, data are partial and contingent and are brought into being through situated practices of conceptualization, recording and use. Big data are also not as revolutionary voluminous, universal or exhaustive as they are often presented. Some initial implications of this reconceptualization of data are explored. A distinction is made between “data in principle” as they are recorded, and the “data in practice” as they are used. It is only the latter, typically a small and not necessarily representative subset of the former, that will contribute directly to the effects of datafication.

Introduction

Data and their effects on individuals, organisations, business models and society have, rightly, attracted growing attention In the Journal of Strategic Information Systems (Newell and Marabelli, 2015, Loebbecke and Picot, 2015, Günther et al., 2017, Markus, 2017). The immediate prompt for this attention has been the “widespread diffusion of digital devices that have the ability to monitor our everyday lives” (Newell and Marabelli, 2015: 3), a process that is referred to as “datafication”. Discussions of this phenomenon, however, have largely taken the data themselves for granted and have focused on how data “are being used, and by whom and with what consequences” (Newell and Marabelli, 2015:3). While, as Galliers et al. (2017) argue, the uses of data raise important questions that deserve the attention of scholars in the IS field (and more widely) in this Viewpoint paper I would like to switch the focus around and consider what constitutes these data, the effects of the accumulation of which we have begun to explore. What actually is it that is having these effects?

This enquiry will critically examine a number of commonly held, and often implicit, assumptions about the nature of data. In doing so I hope to extend the discussion of the datafication phenomenon beyond “its issues, impacts and implications” (Galliers et al., 2017: 188) to include an awareness of the particular character of the ‘material’ on which this phenomenon is based. A better appreciation of this character, it will be argued, may inform a richer understanding of the effects of datafication and open them to more rigorous scrutiny.

What has given questions about the nature of data a particular relevance, of course, is not just the increasing datafication of contemporary life. Rather it is the accumulation of these data in repositories, the analysis of which, often by “pre-determined algorithms that lead to decisions that follow on directly without further human intervention” (Galliers et al., 2017: 185), is seen as transforming work, organisations and society, a development commonly referred to as “big data”.

Although, as will be discussed, “big data” are not necessarily a product of datafication and the meaning of the term itself is highly contested, there is little doubt both that the volume of data being created has greatly expanded in recent years and that techniques to analyse data at this scale have significantly advanced. The paper will therefore also examine assumptions that relate specifically to this accumulated data, not just data themselves.

Section snippets

Assumptions about data

Discussions of data are bedevilled by inconsistencies in the way in with which the term is defined in the literature. A Delphi study of Information Scientists by Zins (2007), for example yielded more than 40 different definitions, while Checkland and Holwell (1998) list seven different definitions from IS textbooks. Although it is certainly beyond the scope of this paper to propose a definitive analysis of these definitions, it would nevertheless seem important to clarify some of the main

Assumptions about big data

The starting point for much discussion of big data is typically a reference to the increasing volume and velocity of data “flowing into every area of the global economy” (Manyika et al., 2011). This is often illustrated by the quoting of very large numbers with exa, peta and tera prefixes describing how many Facebook posts, or Google searches are undertaken every minute, or comparing the volume of data produced between the dawn of civilisation and the early 2000s and the amount of data now

Questioning data

To provide a common reference point for the examination of data, the discussion will draw on examples from ongoing research on the implementation and use of electronic medical records in acute hospitals, and particularly in critical care. Electronic medical records are also widely considered to be prime targets for “big data” initiatives (Groves et al., 2013, Raghupathi and Raghupathi, 2014). This is not to suggest that these examples will be representative of all data, but that they highlight

How data come to be

Looking first at how data are produced, it would seem reasonable to maintain the assumption that data are generally intended to be referential. That is, with maybe a few exceptions, data are collected and used on the basis that they tell us something (although perhaps not everything) about the world. The initial stage in the creation of data, therefore, involves a decision on the phenomenon that they are considered to be a representation of. This decision does not necessarily have a single

How data come to be used

Even the presence of data in the record, however, does not necessarily equate to what actually gets used as data about a phenomenon and there is a further process that mediates between the two. This may also be broken down into a number of stages as shown in Fig. 3.

A necessary starting point for the use of data would seem to be some demand that they are perceived to fulfil. A clinician treating a patient for example seeks data that will help them in their task. The specific data they look for

Discussion and conclusion

If we consider data not as givens that are out there in the world, waiting to be gathered, but as contingent representations that are brought into being through situated practices of conceptualization, recording and use, then what might this mean for our understanding of datafication and of big data more generally? While a complete answer to this question is clearly beyond the scope of this initial account of data in practice, four broad domains of implications may be identified.

The first of

Acknowledgements

The ideas presented in this paper were developed as part of the ReCliC project on the repurposing of clinical data for quality improvement in critical care, a collaboration between the Judge Business School and the Computer Laboratory at the University of Cambridge and Royal Papworth Hospital, funded by the Health Foundation, an independent charity working to improve the quality of healthcare in the UK.

References (64)

F. Endel et al.
Data Wrangling: Making data useful again
IFAC-PapersOnLine
(2015)
R.D. Galliers et al.
Datification and its human, organizational and societal effects: the strategic opportunities and challenges of algorithmic decision-making
J. Strateg. Inf. Syst.
(2017)
W.A. Günther et al.
Debating big data: a literature review on realizing value from big data
J. Strateg. Inf. Syst.
(2017)
C. Loebbecke et al.
Reflections on societal and business model transformation arising from digitization and big data analytics: a research agenda
J. Strateg. Inf. Syst.
(2015)
M.L. Markus
Datification, organizational strategy, and IS research: what’s the score?
J. Strateg. Inf. Syst.
(2017)
S. Newell et al.
Strategic opportunities (and challenges) of algorithmic decision-making: a call for action on the long-term societal effects of “datification”
J. Strateg. Inf. Syst.
(2015)
A. Aaltonen et al.
Everything counts in large amounts: a critical realist case study on data-based production
J. Inf. Technol.
(2014)
R. Ackoff
From data to wisdom
J. Appl. Syst. Anal.
(1989)
C. Anderson
The Petabyte age: because more isn’t just more – more is different
Wired
(2008)
C. Anderson
The end of theory: the data deluge makes the scientific method obsolete
Wired
(2008)

P. Bocij et al.

Business Information Systems: Technology, Development and Management for the e-Business

(2006)

D. Boyd et al.

Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon

Inf. Commun. Soc.

(2012)

P. Checkland et al.

Information, Systems and Information Systems: Making Sense of the Field

(1998)

I.D. Constantiou et al.

New games, new rules: big data and the changing context of strategy

J. Inf. Technol.

(2015)

K. Cukier et al.

The rise of big data: how it’s changing the way we think about the world

Foreign Aff.

(2013)

S. Custer et al.

Pork to Performance: Open Government and Program Performance Tracking in the Philippines – Phase two

(2016)

Custer, S., Sethi, T., Custer, J., 2017. Avoiding data graveyards: how can we overcome barriers to data use? Available...

T.H. Davenport et al.

Information Ecology

(1997)

K. Driscoll et al.

Working within a black box: transparency in the collection and production of big twitter data

Int. J. Commun.

(2014)

P.F. Drucker

The coming of the new organisation

Harvard Bus. Rev.

(1998)

M.S. Feldman et al.

Theorizing practice and practicing theory

Org. Sci.

(2011)

L. Fleck

Genesis and Development of a Scientific Fact

(2012)

M. Foucault

Discipline and Punish: The Birth of the Prison

(1977)

M. Frické

The knowledge pyramid: a critique of the DIKW hierarchy

J. Inf. Sci.

(2009)

M. Frické

Big data and its epistemology

J. Assoc. Inf. Sci. Technol.

(2015)

L. Gitelman et al.

Introduction

P. Groves et al.

The ‘big data’ revolution in healthcare

McKinsey Quart.

(2013)

S. Haag et al.

Management Information Systems for the Information Age

(2013)

S. Hornung

Beyond “New Scientific Management?” Critical reflections on the epistemology of Evidence-based Management

K. Hosanagar et al.

We need transparency in algorithms, but too much can backfire

Harvard Bus. Rev. Digital Art.

(2018)

R. Kitchin

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences

(2014)

R. Kitchin

Big data, new epistemologies and paradigm shifts

Big Data Soc.

(2014)

Cited by (68)

Enhancing innovativeness and performance of the manufacturing supply chain through datafication: The role of resilience
2024, Computers and Industrial Engineering
The Covid-19 pandemic has extremely affected the manufacturing supply chain (SC) highlighting the need to deploy dynamic capabilities (DCs) such as supply chain resilience (SCRes) that enable companies to react rapidly and exploit intangible assets to support long-term performance. Concurrent with the needs dictated by the pandemic, companies are faced with rapid technological development driven by Industry 4.0. Massive amounts of information lead to the need for effective ‘datafication’, where information is standardized and recorded through technologies such as the Internet-of-Things (IoT), and processed by others like Artificial Intelligence (AI). In the disruptive context, companies can remain competitive by turning the crisis into an opportunity for innovation and improving their performance. This study thus explores the impact of datafication, represented by IoT and AI implementation, on manufacturing SC performance and innovativeness and investigates the role of SCRes. Analyzing data collected from 311 Chinese manufacturing companies reveals that datafication positively influences supply chain innovativeness and performance, in which SCRes plays a mediating role. The finding contributes to the ongoing debate on how digital technologies can help organizations improve DCs and achieve competitive advantage. This research also encourages companies, particularly those in developing countries, to take full advantage of Industry 4.0 technologies.
How big data analytics can create competitive advantage in high-stake decision forecasting? The mediating role of organizational innovation
2024, Technological Forecasting and Social Change
The present study is the first to examine the mediating effect of Organizational Innovation in explaining the mechanism of achieving Competitive Advantage by utilization of Big Data. This study aims to address the research gap in investigating the moderating role of Technological Proactive Climate and Data-Driven Culture in ensuring the impact of Big Data Utilization on gaining Competitive Advantage via assessing Organizational Innovation in the Healthcare Industry in emerging economies like India. The primary data from 240 front-line healthcare firm employees from northern India was collected in three waves utilizing convenience sampling. Structural equation modeling and Process Macro were used for data analysis and testing the hypothesized model. The study found that Organizational Innovation mediates the constructive association between Big Data Utilization and Competitive Advantage. Technological Proactive Climate and Data Driven Culture emerged as significant moderators for Big Data Utilization and Competitive Advantage relationship. The results suggest some practical implications for organizations wherein firms can maintain their focus on constant innovation at the organizational level by fostering a technologically proactive environment and a data-driven culture, which will help the business acquire a sustainable Competitive Advantage over its rivals.
A fuzzy Interpretive Structural Modeling approach for implementing IoT and achieving the United Nations Sustainable Development Goals
2023, Decision Analytics Journal
The growing demand for data-driven decision-making and the emergence of new business models has led to a rise in digital technologies. As a result, businesses have become more aware of the impact of their activities on the environment, including the production of digital technologies. Sustainability has become a significant concern for businesses worldwide, pushing them to explore ways to integrate sustainable practices into their operations and address this issue. Digitization has emerged as an essential tool, with the Internet of Things (IoT) technology leading the way. Organizations can improve decision-making and enhance operational efficiency with IoT, but aligning its implementation with the United Nations’ Sustainable Development Goals is essential. Concepts like Industry 4.0 and Circular Economy principles have emerged to create sustainable business models, but their implementation faces various challenges. Businesses must adopt analytical methods and tools to support strategic planning. This study applies the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) method to identify twelve key challenges businesses face in this IoT adoption. A survey was used for gathering expert opinion from industry specialists working in the field of IoT for conducting advanced analytical techniques like Fuzzy Interpretive Structural Modeling (F-ISM) and Matrice d’Impacts Croisés Multiplication Appliqués à un Classement (MICMAC) to highlight the most critical challenges, including loss of value appropriation, Low incentive to change, Less adoption of sustainable practices and Lack of interoperability. While IoT holds great promise for supporting sustainable development, the result highlights the urgent need for stakeholder collaboration to overcome these challenges and unlock its full potential. The findings of this study benefit organizations heavily dependent on digital services like IoT by providing them with a comprehensive framework for effectively implementing IoT towards sustainable development.
Beyond effective use: Integrating wise reasoning in machine learning development
2023, International Journal of Information Management
The introduction of machine learning (ML), as the engine of many artificial intelligence (AI)-enabled systems in organizations, comes with the claim that ML models provide automated decisions or help domain experts improve their decision-making. Such a claim gives rise to the need to keep domain experts in the loop. Hence, data scientists, as those who develop ML models and infuse them with human intelligence during ML development, interact with various ML stakeholders and reflect their views within ML models. This interaction comes with (often conflicting) demands from various ML stakeholders and potential tensions. Building on the theories of effective use and wise reasoning, this mixed method study proposes a model to better understand how data scientists can use wisdom for managing these tensions when they develop ML models. In Study 1, through interviewing 41 analytics and ML experts, we investigate the dimensions of wise reasoning in the context of ML development. In Study 2, we test the overall model using a sample of 249 data scientists. Our results confirm that to develop effective ML models, data scientists need to not only use ML systems effectively, but also practice wise reasoning in their interactions with domain experts. We discuss the implications of these findings for research and practice.
Data sustainability: Data governance in data infrastructures across technological and human generations
2023, Information and Organization
The paper highlights the importance of data sustainability in the data infrastructures aimed at long-term knowledge discoveries. Data sustainability refers to data's capacity to endure across technological and human generations, and it problematizes the data governance literature from a temporal perspective. Existing work has already moved the literature from the organizational setting to more complex interorganizational settings, highlighting discrepancies between normative data governance models and organizational practices. We broaden this literature temporally by examining and outlining research directions for data sustainability from different meta-theoretical perspectives – evolutionary, relational, and durational. Data sustainability across technological and human generations navigates complementary and competing temporal demands: Data need to transition across socio-technical regimes over time, yet be embedded in social and material networks to be meaningful; historical and present data also must remain available and accessible in near and distant futures, for going back in time and seeing new data linkages and combinations. We argue that data sustainability is critical in ensuring progression in social and environmental sustainability. The paper contributes both to data governance and sustainability literatures.
Future directions for scholarship on data governance, digital innovation, and grand challenges
2023, Information and Organization
This introduction to the special issue on Data Governance, Digital Innovation, and Grand Challenges highlights the importance of data governance when seeking to address grand challenges through the innovative use of digital technologies. The benefits, risks, and consequences of data, ubiquitous in today's data-rich world, can be harnessed for innovation and societal good. However, there are no guarantees that (only) desirable outcomes will develop. The creation and exploitation of vast data stockpiles raise substantial concerns about privacy, data security, equity, and the potential for harm from data misuse. Meaningful approaches to data governance within and across organizations are critically important to facilitate digital innovation and to balance social, economic and technical benefits and risks for individuals, organizations, and societies. In this introductory paper, we reflect on foundations established to date in information systems (IS) research and highlight possible future directions for scholarship on data governance across multiple levels to enhance digital innovations for transformation and societal good.

View all citing articles on Scopus

View full text

What we talk about when we talk about (big) data

Abstract

Introduction

Section snippets

Assumptions about data

Assumptions about big data

Questioning data

How data come to be

How data come to be used

Discussion and conclusion

Acknowledgements

IFAC-PapersOnLine

J. Strateg. Inf. Syst.

J. Strateg. Inf. Syst.

J. Strateg. Inf. Syst.

J. Strateg. Inf. Syst.

J. Strateg. Inf. Syst.

Everything counts in large amounts: a critical realist case study on data-based production

J. Inf. Technol.

From data to wisdom

J. Appl. Syst. Anal.

The Petabyte age: because more isn’t just more – more is different

Wired

The end of theory: the data deluge makes the scientific method obsolete

Wired

Business Information Systems: Technology, Development and Management for the e-Business

Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon

Inf. Commun. Soc.

Information, Systems and Information Systems: Making Sense of the Field

New games, new rules: big data and the changing context of strategy

J. Inf. Technol.

The rise of big data: how it’s changing the way we think about the world

Foreign Aff.

Pork to Performance: Open Government and Program Performance Tracking in the Philippines – Phase two

Information Ecology

Working within a black box: transparency in the collection and production of big twitter data

Int. J. Commun.

The coming of the new organisation

Harvard Bus. Rev.

Theorizing practice and practicing theory

Org. Sci.

Genesis and Development of a Scientific Fact

Discipline and Punish: The Birth of the Prison

The knowledge pyramid: a critique of the DIKW hierarchy

J. Inf. Sci.

Big data and its epistemology

J. Assoc. Inf. Sci. Technol.

Introduction

The ‘big data’ revolution in healthcare

McKinsey Quart.

Management Information Systems for the Information Age

Beyond “New Scientific Management?” Critical reflections on the epistemology of Evidence-based Management

We need transparency in algorithms, but too much can backfire

Harvard Bus. Rev. Digital Art.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences

Big data, new epistemologies and paradigm shifts

Big Data Soc.