Fostering scientists’ data sharing behaviors via data repositories, journal supplements, and personal communication methods

https://doi.org/10.1016/j.ipm.2017.03.003Get rights and content

Abstract

The purpose of this study is to examine how institutional pressures, individual motivations, and resources all affect scientists’ diverse data sharing behaviors, including (a) making data accessible through data repositories, (b) submitting data as journal supplements, and (c) providing data via personal communication methods upon request. A combined theoretical framework integrating institutional theory and theory of planned behavior was used to create a research model which presents how scientists make the decision to share data in diverse ways, and how the data sharing factors differ across diverse data sharing behaviors. A survey method was employed to evaluate the research model by using multivariate regression analysis technique with a total of 2172 survey responses in the U.S. The results of this research show the dynamic relationships between diverse data sharing factors and different forms of data sharing behaviors. For data sharing via data repository, journal pressure, perceived effort, and availability of data repositories are significant factors; for data sharing through journal supplement, journal pressure, perceived career benefit, perceived effort, and availability of data repository are significant factors; for personal data sharing, funding agency pressure, normative pressure, perceived career risk, perceived effort, and availability of data repositories are significant factors. This research suggests that funding agencies, journal publishers, and scientific communities that different strategies need to be employed to promote different forms of data sharing behaviors.

Introduction

Data sharing is essential to contemporary scientific research from the perspective of e-Science and open science. The term e-Science is defined as “networked and data-driven science,” (Hey & Hey, 2006) and a critical aspect of it centers on global collaboration in key areas of science being enabled by data-centric scientific research based on shared data sets (Hey & Trefethen, 2002). Open science refers to conducting research in a collaborative manner by sharing and reusing research data and relevant materials (FOSTER, 2016). Both e-Science and open science promise to reshape and enhance the way science is done by empowering data-driven scientific research and improving the synthesis and analysis of scientific data in a collaborative fashion (Atkins, 2006, Molloy, 2011). The recent advancement in information and communication technologies such as data repositories and personal communication methods has enabled scientists to share their research data along with their research publications, thus achieving the core vision of e-Science and open science, which is data-driven science based on shared data sets.

Traditionally, formal scholarly communication is based on journal articles, conference proceedings, and sometimes article preprints. Recently, however, original research data has taken its place in formal scholarly communication. Scientific research now requires original data sets for diverse purposes such as large-scale computation, comparative research, or replication of previous works for further research. As primary data becomes important in terms of e-Science and open science, data sharing becomes critical to scientific research (Borgman, 2012, Tenopir et al., 2015). Scientific communities have developed diverse data repositories, and scientists have become more aware of the importance of data sharing (Borgman, 2010, Gewin, 2016, Tenopir et al., 2011, Tenopir et al., 2015). Furthermore, national funding agencies in the U.S. such as the National Science Foundation (NSF), National Institutes of Health (NIH), and National Cancer Institute (NCI) have mandated data sharing in many disciplines as a part of their grant requirements (NCI 2006, NIH 2007, NSF 2012); funding agencies in European Union (EU) and United Kingdom (UK) such as European Research Council (ERC), Research Councils in UK (RCUK), and Wellcome Trust (WT) have also promoted open science research based on shared data sets and research articles (ERC 2012, RCUK 2016, WT 2015). In addition, a number of journals have implemented data sharing policies (Piwowar and Chapman, 2008, Savage and Vickers, 2009). Despite this, data sharing is still not well-deployed throughout diverse disciplines as a common research practice (Tenopir et al., 2011, Tenopir et al., 2015, Wallis et al., 2013).

Technological infrastructures, institutional set-ups, and individual motivations often contribute to scientists’ data sharing behaviors. Contemporary collaboration in science and engineering fields requires the orchestration of technological infrastructure, institutional support, and interpersonal interactions (Kim & Stanton, 2012). Similarly, scientists’ data sharing as the microcosm of contemporary collaboration involves the same three areas of infrastructure, institutions, and people. Individual scientists are embedded in institutional contexts, including belonging to universities and academic disciplines, and drawing support from organizational and disciplinary technological infrastructure. This research considers the combination of infrastructure, institution, and people as important components influencing scientists’ data sharing, and examines how those factors influence diverse forms of data sharing behaviors.

The data sharing behaviors of scientists occur in diverse forms, including uploading data in data repositories, submitting data as journal supplements, and providing data via personal communication methods upon request. Since each discipline has its own data sharing practices, scientists’ data sharing differs across disciplines. Furthermore, even in the same discipline, scientists’ data sharing behaviors can vary because of their technological infrastructures, institutional set-ups, and individual expectations. Therefore, it is very important to understand how diverse data sharing factors, including institutional pressures, individual motivations, and technological resources, all affect scientists’ data sharing behaviors. In this research, the diverse forms of data sharing behaviors are categorized into three different actions: (a) making data accessible through data repositories, (b) submitting data as journal supplements, and (c) providing data via personal communication methods upon request.

This research investigates how institutional, individual, and resource factors all map to scientists’ data sharing behaviors. This research focuses on the scientists in STEM (Science, Technology, Engineering, and Mathematics) disciplines, as their data sharing and reuse become more important by institutional policies, technological infrastructure, and their scientists’ awareness (Kim and Stanton, 2016, Kim and Zhang, 2015). This research assumes that the data sharing behaviors of scientists are not a matter of an individual scientist's arbitrary choice, but rather, that decisions as to whether to share data with researchers outside of their research group reflect the choices of communities of colleagues embedded within their disciplines. Therefore, it is necessary to investigate how the combinations of institutional, individual, and resource factors influence scientists’ diverse forms of data sharing behaviors. This investigation provides a holistic view of the institutional, individual, and resource factors influencing scientists’ diverse forms of data sharing in different institutional settings.

Section snippets

Literature review

Research data (data in general) refer to the extensive range of relevant information about research processes and results. Individual researchers or groups of researchers collect data using diverse collection methods including observations, experiments, and simulations. In this research, “data sharing” is defined as scientists’ providing the research data behind their published article(s) to other researchers in “diverse forms,” including (a) making data accessible through data repositories,

Theoretical framework

In order to fully understand scientists’ data sharing, it is necessary to consider how individual scientists make their decisions regarding various data sharing behaviors. This research assumes that institutional pressures, technological infrastructures, and individual motivations may influence an individual scientist's decision making regarding different forms of data sharing. Scientists make their decisions in the context of their roles in relation to universities, professional associations,

Research model & hypotheses development

The research model provides an extensive map of scientists’ data sharing behaviors based on the combination of institutional theory and the theory of planned behavior. The research model explains how scientists make their own decisions to use diverse methods to share data, and how the data sharing factors differ across the diverse forms of data sharing behaviors. From the perspective of institutional theory, the data sharing behaviors of scientists can be influenced by regulative pressures from

Research methods

This research employed a survey method to validate the research model of scientists’ data sharing behaviors and to evaluate the hypothesized relationships in the model. The survey was conducted with researchers in STEM disciplines, such as physical sciences, biological sciences, engineering, health sciences, and social sciences. The survey method helped to examine to what extent institutional, individual, and resource factors influence scientists’ diverse forms of data sharing behaviors.

Scale assessment

Since this research employed a survey as a main method, issues of measurement reliability and validity are all important. This research ensured reliability in terms of test-retest issues and internal consistency by using well-developed items from prior studies (Kim & Stanton, 2012, 2016). Also, reliability assessment for each construct was conducted by checking internal consistency of variables with Cronbach's alpha. All the constructs have Cronbach's alpha coefficients ranging from 0.871 to

Discussion

This research demonstrates that diverse data sharing factors, including institutional pressures, individual motivations, and resources, affect data sharing behaviors via data repositories, journal supplements, and personal communication methods. Fig. 2 shows the summary of research findings based on the evaluation of research model.

While the specific results of this research add to the understanding of data sharing behavior via a variety of methods, they also raise a number of new and important

Conclusion

This research revealed the dynamics of relationships between different data sharing factors and diverse forms of data sharing behaviors via data repositories, journal supplements, and personal communication methods. The results of this research suggest that institutional pressures, individual motivations, and resource factors all have significant impacts on scientists’ diverse forms of data sharing behaviors at different levels. This research provided a nice road map about how scientific

Note

Both survey data and instrument have been made publicly available via Open ICPSR (Inter-university Consortium for Political and Social Research) and can be accessed at http://doi.org/10.3886/E100087V7.

Acknowledgments

I would like to acknowledge the ProQuest Pivot for allowing me to use its Community of Scientists (CoS) Scholar Database in recruiting the survey participants. I also would like to acknowledge Darra Hofman and anonymous reviewers for reviewing this article and providing constructive feedback.

References (77)

  • I. Ajzen

    Perceived behavioral control, self-efficacy, locus of control, and the theory of planned behavior

    Journal of Applied Social Psychology

    (2002)
  • I. Ajzen et al.

    The influence of attitudes on behavior

  • D.E. Atkins

    Cyberinfrastructure for research

    Issues in Science and Technology

    (2006)
  • J. Battilana

    Agency and institutions: The enabling role of individuals' social position

    Organization

    (2006)
  • D. Blumenthal et al.

    Data withholding in genetics and the other life sciences: Prevalences and predictors

    Academic Medicine

    (2006)
  • C.L. Borgman

    Scholarship in the digital age: Information, infrastructure, and the internet

    (2007)
  • C.L. Borgman

    The digital future is now: A call to action for the humanities

    Digital Humanities Quarterly

    (2009)
  • C.L. Borgman

    Research Data: who will share what, with whom, when, and why? Paper presented at the Fifth China – North America Library Conference, Beijing, China

    (2010)
  • C.L. Borgman

    The conundrum of sharing research data

    Journal of the American Society for Information Science and Technology

    (2012)
  • K. Briney et al.

    Do you have an institutional data policy? A review of the current landscape of library data services and institutional data policies

    Journal of Librarianship and Scholarly Communication

    (2015)
  • C. Brown

    The changing face of scientific discourse: Analysis of genomic and proteomic database usage and acceptance

    Journal of the American Society for Information Science and Technology

    (2003)
  • A. Budros

    The mean and lean firm and downsizing: Causes of involuntary and voluntary downsizing strategies

    Sociological Forum

    (2002)
  • E.G. Campbell et al.

    Data-sharing and data-withholding in genetics and the life sciences: Results of a national survey of technology transfer officers

    J Health Care Law Policy

    (2003)
  • E.G. Campbell et al.

    Data withholding in academic genetics – Evidence from a national survey

    Journal of the American Medical Association

    (2002)
  • G.S. Choudhury

    Case study in data curation at Johns Hopkins University

    Library Trends

    (2008)
  • M.H. Cragin et al.

    Data sharing, small science and institutional repositories

    Philosophical Transactions of the Royal Society a-Mathematical Physical and Engineering Sciences

    (2010)
  • A.W. Crall et al.

    Improving and integrating data on invasive species collected by citizen scientists

    Biological Invasions

    (2010)
  • P.J. DiMaggio et al.

    The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields

    American Sociological Review

    (1983)
  • ERC. (2012). Commission recommendation on access to and preservation of scientific information. Retrieved from...
  • I.M. Faniel et al.

    Social scientists' satisfaction with data reuse

    Journal of the Association for Information Science and Technology

    (2016)
  • L.M. Federer et al.

    Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff

    Plos One

    (2015)
  • A. Field

    Discovering statistics using SPSS

    (2009)
  • S.E. Fienberg

    Sharing statistical data in the biomedical and health sciences: Ethical, institutional, legal, and professional dimensions

    Annual Review of Public Health

    (1994)
  • FOSTER. (2016). Open science definition. Retrieved from...
  • D. Gefen et al.

    Structural equation modeling and regression: Guidelines for research practice

    Communications of the AIS

    (2000)
  • V. Gewin

    DATA SHARING An open mind on open data

    Nature

    (2016)
  • R. Grewal et al.

    The role of the institutional environment in marketing channels

    Journal of Marketing

    (2002)
  • J.F. Hair et al.

    Multivariate data analysis

    (2006)
  • Cited by (0)

    View full text