Elsevier

Social Networks

Volume 35, Issue 3, July 2013, Pages 370-381
Social Networks

The use of different data sources in the analysis of co-authorship networks and scientific performance

https://doi.org/10.1016/j.socnet.2013.04.004Get rights and content

Highlights

  • We analyse collaboration style and scientific performance of Italian statisticians.

  • We use three data sources to construct co-authorship networks.

  • We assess network structures of the whole community and of the Statistics subfields.

  • We model the effect of actor network position on scientific performance.

  • We find distinct collaboration patterns and effects on scientific performance.

Abstract

Scientific collaboration is usually derived from archival co-authorship data. Several data sources may be examined, but they all have advantages and disadvantages, especially when a specific discipline or community is of interest. The aim of this paper is to explore the effect of the use of three data sources – Web of Science, Current Index to Statistics and nationally funded research projects – on the analysis of co-authorship networks among Italian academic statisticians. Results provide evidence of our hypotheses on distinct collaboration patterns among statisticians, as well as distinct effects of scientist network positions on scientific performance, by both Statistics subfield and data source.

Introduction

Collaboration in science is a complex phenomenon which affects scientific productivity in various ways (Lee and Bozeman, 2005), as well as knowledge diffusion within and between disciplines. Collaboration is considered to be a key element in the advancement of knowledge, because scientists in collaboration networks share ideas, use similar techniques, and influence each other's work. By means of collaboration, scientists may benefit by both technological expertises and team work synergy, thus improving the quality and quantity of their research output. As empirical evidence, collaboration among scientists is increasing in all disciplines (e.g., Babchuk et al., 1999, Glanzel and Schubert, 2004, Kronegger et al., 2011).

In this stream of research, Social Network Analysis (SNA) has become the privileged theoretical and statistical approach to study the typical collaboration patterns within disciplines (for instance, see Burt, 1978/1979, and Moody, 2004 for Sociology; Albert and Barabási, 2002, and Newman, 2004 for Physics and Biomedical research; and Goyal et al., 2006 for Economics). It is straightforward to think about collaboration among scientists as a network, in which the actors are scholars and ties may be represented by various forms of scientific collaboration among them. Thanks to the availability of international bibliographic databases, the most frequent way of specifying such networks is to take into account formal research activities, especially co-authorship (i.e., co-production of scientific publications)1.

The present paper deals with network analysis of co-authorship patterns in Statistics, focusing in particular on the population of academic statisticians in Italy, that is, those scientists classified as belonging to one of the five Statistics subfields: Statistics, Statistics for Experimental and Technological Research, Economic Statistics, Demography, and Social Statistics.

Attention to this community derives from several motivations. Unlike other disciplines, co-authorship behaviour in Statistics has not yet been investigated. The field of Statistics presents some characteristics common to natural sciences as well as social sciences. Even if it is usually considered in the stream of social sciences – especially in Italian academic tradition – it plays a central role in all sciences in view of the importance of statistical methods in everyday applications. As reported by Leti (2000, p. 188): “The new natural science was made possible by the invention and scientific use of instruments which went beyond man's capabilities in their examination of nature. Similarly, Statistics as a method, by superseding human inability to quantify collective phenomena, permitted greater insight into these phenomena (originally those concerning the state and society). The new natural sciences and Statistics followed the same approach, shared a mathematical basis, and pursued both scientific and practical aims”. Similar arguments are also reported in Kagan (2009) when he proposed nine dimensions to compare research approach in natural sciences, social sciences and humanities. Furthermore, although social and natural scientists work both in and outside of traditional lab settings, “the rise of large-scale data collection efforts suggests a team-production model” (Moody, 2004, p. 217) similar to the typical one that mainly characterises the scientific output production in natural sciences.

Statistics is also unique with respect to the other social sciences, since several problems in different disciplines may be addressed by its methods (Cox, 1997). Therefore, it is of interest to examine what emerging pattern describes the diffusion of statistical knowledge – although limited to a country level community.

It is relevant to trace this specific target population in high-impact journal international databases and to reveal the influence on the resulting co-authorship patterns related to distinct data sources. For these purposes, two international databases, one general (Web of Science, WoS) and one thematic (Current Index to Statistics, CIS) are examined here, together with bibliographic information retrieved from the Italian Ministry of University and Research (MIUR) database of nationally funded research projects (PRIN).

We provide several research hypotheses on the resulting collaboration patterns of Italian academic statisticians, regarded as a whole group, and also taking into account the five subfields into which the group is organised. Following seminal papers on co-authorship analysis (in particular, Albert and Barabási, 2002, Moody, 2004, Newman, 2004, Goyal et al., 2006) to allow comparisons, this study adds some substantial elements:

  • it analyses a target population (Italian academic statisticians) involved in a discipline (Statistics) which is not yet fully explored in terms of its scientific collaboration behaviour. In addition, the specialised subfields within the whole discipline may be described by several cooperative patterns, depending on the level of interdisciplinarity characterising scientists’ activities;

  • it considers three data sources. In general, we assume that the collaboration structure, and hence knowledge flows, in scientific communities depends to a great extent on the kinds of publications pertaining to the various archives considered for network construction;

  • it explores the effects of authors’ network positions on scientific performance as measured by the h-index. For this aim, a generalised extreme value distribution (GEV) is fitted, to take into account the particular distribution of this index, which is usually highly skewed and heavy-tailed.

The paper is organised as follows: Section 2 presents the framework linking network structures to the diffusion of knowledge in scientific communities, and reports the main empirical results related to network topologies observed in several disciplines. After a description of the data sources used to collect co-authorship data on Italian academic statisticians, Section 3 describes data retrieval and cleansing in detail. Authors’ coverage rates and publication characteristics in the three data sources are presented. Section 4 illustrates our research hypotheses on scientific collaboration patterns and their influence on scientific performance. In Section 5, the co-authorship trend and networks of Italian academic statisticians are analysed and results on highly connected statisticians are given. The relationship between authors’ h-index and their network positions is modeled. Section 6 concludes, with a discussion and final remarks.

Section snippets

Co-authorship networks and patterns of collaboration in scientific communities

Scientific collaboration is a mix of informal mechanisms (e.g., advices, face-to-face contacts, exchange of personal knowledge), and formal activities (e.g., writing papers, participating in research projects) among scientists involved in producing knowledge, as suggested in Lievrouw et al. (1987), Liberman and Wolf (1997), and Liberman and Wolf (1998). Direct interviews can be very useful to gain insights on informal collaboration,2

Data sources on co-authorship for Italian academic statisticians

Seminal studies in scientific collaboration are based on international databases containing mainly high-impact publications (for instance, Sociological Abstracts in Moody, 2004, MEDLINE in Newman, 2004, and Econlit in Goyal et al., 2006). These bibliographic databases allow exploration of the collaboration patterns among scientists working on topics covered by the editorial policies on which the archives are based. The advantages of using such data sources are that they are relatively

Co-authorship patterns in Statistics: research hypotheses

Starting from the co-authorship networks derived from the three data sources, we provide evidence on several research hypotheses on scientific collaboration patterns among Italian academic statisticians:

  • H1: The number of co-authored publications by Italian academic statisticians is growing faster than the number of single-authored publications, as observed in other scientific disciplines.

    The probability of co-authoring differs across disciplines and over time but, in the last few decades, it

Analysis of co-authorship of Italian academic statisticians

In the following we present both collaboration trend and network analysis results for Italian academic statisticians related to our research hypotheses.

Discussion and concluding remarks

This study focuses on the co-authorship patterns of the community of Italian academic statisticians as they emerge from three data sources which contain different kinds of scientific publications. A different coverage rate was obtained from the three data sources for all statisticians, and in particular for some subfields. As a general finding, in international databases, Demography, Economic Statistics and Social Statistics have low author coverage rates.

The whole bulk of results on Italian

Acknowledgements

The authors would like to thanks Francesco Pauli (University of Trieste) for his useful suggestions in GEV model estimation, the MIUR for PRIN data source availability, the editor and the anonymous reviewers for their helpful comments.

References (43)

  • S. Liberman et al.

    The flow of knowledge: scientific contacts in formal meetings

    Social Networks

    (1997)
  • S. Liberman et al.

    Bonding number in scientific disciplines

    Social Networks

    (1998)
  • L.A. Lievrouw et al.

    Triangulation as a research strategy for identifying invisible colleges among biomedical scientists

    Social Networks

    (1987)
  • P.T. Nicholls

    Empirical validation of Lotka's law

    Information Processing and Management

    (1986)
  • R. Albert et al.

    Statistical mechanics of complex networks

    Review of Modern Physics

    (2002)
  • N. Babchuk et al.

    Collaboration in sociology and other scientific disciplines: a comparative trend analysis of scholarship in the social, physical and mathematical sciences

    The American Sociologist

    (1999)
  • A. Baccini et al.

    How are statistical journals linked? A network analysis

    Chance

    (2009)
  • J. Beirlant et al.

    Asymptotics for the Hirsch Index

    Scandinavian Journal of Statistics

    (2010)
  • C. Calero et al.

    How to identify research groups using publication analysis: an example in the field of nanotechnology

    Scientometrics

    (2006)
  • A. Clauset et al.

    Power-law distributions in empirical data

    SIAM Review

    (2009)
  • S. Coles

    An Introduction to Statistical Modeling of Extreme Values

    (2001)
  • Cited by (83)

    • Gender, personality, and performance

      2024, Journal of Behavioral and Experimental Economics
    • Gender inequalities in research funding: Unequal network configurations, or unequal network returns?

      2022, Social Networks
      Citation Excerpt :

      We conduct our analysis on one of the main public instruments for funding research in Italy, the Ministry of University and Research’s program that funds projects of national relevance (PRIN). This line of funding has already been used in previous studies on collaborative scientific networks (Bellotti et al., 2015, 2016; De Stefano et al., 2013; Zinilli, 2016). We take funded research projects as a proxy for collaborations as we believe that co-participating in a research project is a robust indication of collaborative relationship.

    • Social networks and open innovation: Business academic productivity

      2021, Journal of Open Innovation: Technology, Market, and Complexity
    View all citing articles on Scopus
    View full text