The use of different data sources in the analysis of co-authorship networks and scientific performance
Introduction
Collaboration in science is a complex phenomenon which affects scientific productivity in various ways (Lee and Bozeman, 2005), as well as knowledge diffusion within and between disciplines. Collaboration is considered to be a key element in the advancement of knowledge, because scientists in collaboration networks share ideas, use similar techniques, and influence each other's work. By means of collaboration, scientists may benefit by both technological expertises and team work synergy, thus improving the quality and quantity of their research output. As empirical evidence, collaboration among scientists is increasing in all disciplines (e.g., Babchuk et al., 1999, Glanzel and Schubert, 2004, Kronegger et al., 2011).
In this stream of research, Social Network Analysis (SNA) has become the privileged theoretical and statistical approach to study the typical collaboration patterns within disciplines (for instance, see Burt, 1978/1979, and Moody, 2004 for Sociology; Albert and Barabási, 2002, and Newman, 2004 for Physics and Biomedical research; and Goyal et al., 2006 for Economics). It is straightforward to think about collaboration among scientists as a network, in which the actors are scholars and ties may be represented by various forms of scientific collaboration among them. Thanks to the availability of international bibliographic databases, the most frequent way of specifying such networks is to take into account formal research activities, especially co-authorship (i.e., co-production of scientific publications)1.
The present paper deals with network analysis of co-authorship patterns in Statistics, focusing in particular on the population of academic statisticians in Italy, that is, those scientists classified as belonging to one of the five Statistics subfields: Statistics, Statistics for Experimental and Technological Research, Economic Statistics, Demography, and Social Statistics.
Attention to this community derives from several motivations. Unlike other disciplines, co-authorship behaviour in Statistics has not yet been investigated. The field of Statistics presents some characteristics common to natural sciences as well as social sciences. Even if it is usually considered in the stream of social sciences – especially in Italian academic tradition – it plays a central role in all sciences in view of the importance of statistical methods in everyday applications. As reported by Leti (2000, p. 188): “The new natural science was made possible by the invention and scientific use of instruments which went beyond man's capabilities in their examination of nature. Similarly, Statistics as a method, by superseding human inability to quantify collective phenomena, permitted greater insight into these phenomena (originally those concerning the state and society). The new natural sciences and Statistics followed the same approach, shared a mathematical basis, and pursued both scientific and practical aims”. Similar arguments are also reported in Kagan (2009) when he proposed nine dimensions to compare research approach in natural sciences, social sciences and humanities. Furthermore, although social and natural scientists work both in and outside of traditional lab settings, “the rise of large-scale data collection efforts suggests a team-production model” (Moody, 2004, p. 217) similar to the typical one that mainly characterises the scientific output production in natural sciences.
Statistics is also unique with respect to the other social sciences, since several problems in different disciplines may be addressed by its methods (Cox, 1997). Therefore, it is of interest to examine what emerging pattern describes the diffusion of statistical knowledge – although limited to a country level community.
It is relevant to trace this specific target population in high-impact journal international databases and to reveal the influence on the resulting co-authorship patterns related to distinct data sources. For these purposes, two international databases, one general (Web of Science, WoS) and one thematic (Current Index to Statistics, CIS) are examined here, together with bibliographic information retrieved from the Italian Ministry of University and Research (MIUR) database of nationally funded research projects (PRIN).
We provide several research hypotheses on the resulting collaboration patterns of Italian academic statisticians, regarded as a whole group, and also taking into account the five subfields into which the group is organised. Following seminal papers on co-authorship analysis (in particular, Albert and Barabási, 2002, Moody, 2004, Newman, 2004, Goyal et al., 2006) to allow comparisons, this study adds some substantial elements:
- •
it analyses a target population (Italian academic statisticians) involved in a discipline (Statistics) which is not yet fully explored in terms of its scientific collaboration behaviour. In addition, the specialised subfields within the whole discipline may be described by several cooperative patterns, depending on the level of interdisciplinarity characterising scientists’ activities;
- •
it considers three data sources. In general, we assume that the collaboration structure, and hence knowledge flows, in scientific communities depends to a great extent on the kinds of publications pertaining to the various archives considered for network construction;
- •
it explores the effects of authors’ network positions on scientific performance as measured by the h-index. For this aim, a generalised extreme value distribution (GEV) is fitted, to take into account the particular distribution of this index, which is usually highly skewed and heavy-tailed.
The paper is organised as follows: Section 2 presents the framework linking network structures to the diffusion of knowledge in scientific communities, and reports the main empirical results related to network topologies observed in several disciplines. After a description of the data sources used to collect co-authorship data on Italian academic statisticians, Section 3 describes data retrieval and cleansing in detail. Authors’ coverage rates and publication characteristics in the three data sources are presented. Section 4 illustrates our research hypotheses on scientific collaboration patterns and their influence on scientific performance. In Section 5, the co-authorship trend and networks of Italian academic statisticians are analysed and results on highly connected statisticians are given. The relationship between authors’ h-index and their network positions is modeled. Section 6 concludes, with a discussion and final remarks.
Section snippets
Co-authorship networks and patterns of collaboration in scientific communities
Scientific collaboration is a mix of informal mechanisms (e.g., advices, face-to-face contacts, exchange of personal knowledge), and formal activities (e.g., writing papers, participating in research projects) among scientists involved in producing knowledge, as suggested in Lievrouw et al. (1987), Liberman and Wolf (1997), and Liberman and Wolf (1998). Direct interviews can be very useful to gain insights on informal collaboration,2
Data sources on co-authorship for Italian academic statisticians
Seminal studies in scientific collaboration are based on international databases containing mainly high-impact publications (for instance, Sociological Abstracts in Moody, 2004, MEDLINE in Newman, 2004, and Econlit in Goyal et al., 2006). These bibliographic databases allow exploration of the collaboration patterns among scientists working on topics covered by the editorial policies on which the archives are based. The advantages of using such data sources are that they are relatively
Co-authorship patterns in Statistics: research hypotheses
Starting from the co-authorship networks derived from the three data sources, we provide evidence on several research hypotheses on scientific collaboration patterns among Italian academic statisticians:
- •
H1: The number of co-authored publications by Italian academic statisticians is growing faster than the number of single-authored publications, as observed in other scientific disciplines.
The probability of co-authoring differs across disciplines and over time but, in the last few decades, it
Analysis of co-authorship of Italian academic statisticians
In the following we present both collaboration trend and network analysis results for Italian academic statisticians related to our research hypotheses.
Discussion and concluding remarks
This study focuses on the co-authorship patterns of the community of Italian academic statisticians as they emerge from three data sources which contain different kinds of scientific publications. A different coverage rate was obtained from the three data sources for all statisticians, and in particular for some subfields. As a general finding, in international databases, Demography, Economic Statistics and Social Statistics have low author coverage rates.
The whole bulk of results on Italian
Acknowledgements
The authors would like to thanks Francesco Pauli (University of Trieste) for his useful suggestions in GEV model estimation, the MIUR for PRIN data source availability, the editor and the anonymous reviewers for their helpful comments.
References (43)
- et al.
Identifying the effects of co-authorship networks on the performance of scholars: a correlation and regression analysis of performance measures and social network analysis measures
Journal of Informetrics
(2011) Getting funded. Multi-level network of physicists in Italy
Social Networks
(2012)- et al.
The asymptotic number of labelled graphs with given degree sequence
Journal of Combinatorial Theory A
(1978) Stratification and prestige among elite experts in methodological and mathematical sociology circa 1975
Social Networks
(1978/1979)- et al.
The h-index: advantages, limitations and its relation with other bibliometric indicators at the micro level
Journal of Informetrics
(2007) Community detection in graphs
Physics Reports
(2010)- et al.
Connectivity in a citation network: the development of DNA theory
Social Networks
(1989) - et al.
Social networks as normal science
Social Networks
(1993) - et al.
On co-authorship for author disambiguation
Information Processing and Management
(2009) - et al.
What is research collaboration?
Research Policy
(1997)
The flow of knowledge: scientific contacts in formal meetings
Social Networks
Bonding number in scientific disciplines
Social Networks
Triangulation as a research strategy for identifying invisible colleges among biomedical scientists
Social Networks
Empirical validation of Lotka's law
Information Processing and Management
Statistical mechanics of complex networks
Review of Modern Physics
Collaboration in sociology and other scientific disciplines: a comparative trend analysis of scholarship in the social, physical and mathematical sciences
The American Sociologist
How are statistical journals linked? A network analysis
Chance
Asymptotics for the Hirsch Index
Scandinavian Journal of Statistics
How to identify research groups using publication analysis: an example in the field of nanotechnology
Scientometrics
Power-law distributions in empirical data
SIAM Review
An Introduction to Statistical Modeling of Extreme Values
Cited by (83)
Gender, personality, and performance
2024, Journal of Behavioral and Experimental EconomicsGender inequalities in research funding: Unequal network configurations, or unequal network returns?
2022, Social NetworksCitation Excerpt :We conduct our analysis on one of the main public instruments for funding research in Italy, the Ministry of University and Research’s program that funds projects of national relevance (PRIN). This line of funding has already been used in previous studies on collaborative scientific networks (Bellotti et al., 2015, 2016; De Stefano et al., 2013; Zinilli, 2016). We take funded research projects as a proxy for collaborations as we believe that co-participating in a research project is a robust indication of collaborative relationship.
Social networks and open innovation: Business academic productivity
2021, Journal of Open Innovation: Technology, Market, and ComplexityQuality issues in co-authorship data of a national scientific community
2023, Network Science