Critical thresholds for co-citation clusters and emergence of the giant component

doi:10.1016/j.joi.2009.05.001

Journal of Informetrics

Volume 3, Issue 4, October 2009, Pages 332-340

https://doi.org/10.1016/j.joi.2009.05.001 Get rights and content

Abstract

The behavior of co-citation clusters is studied over a wide range of similarity values, and we demonstrate the existence of critical or percolation transitions marked by a sudden expansion of cluster size with a small decrease in similarity which, in most cases, reflects the emergence of a giant component on the overall graph for the dataset. The study was motivated by the question of how to set appropriate thresholds for delineating individual research areas that identify, as far as possible, natural boundaries, in view of the fact that a threshold or criterion appropriate for one area may not be appropriate for another. We explore the rate of change in cluster size as a possible boundary indicator. The relationship of this critical behavior to maps of science is discussed.

Section snippets

Background

From a mathematical point of view, a co-citation network is a one-mode projection of a bipartite network consisting of two types of vertices, citing documents and cited documents, analogous to co-author networks consisting of author and document vertices (Newman et al., 2001). In forming a co-citation network we usually begin by selecting a sample of cited documents and then compute all the co-citations between the document pairs. A coefficient of similarity is defined on each pair, such as

Data

The datasets used in this study each consist of about 50,000 highly cited papers which constitute the top 1% of papers by citation count for individual years and in each of 22 broad fields covering a specific six-year rolling period. The same six-year period was used for both the cited and citing time windows, for example, 2003–2008. The source of data is the Web of Science^®, and the dataset is identical to that used in the production of research fronts for the Essential Science Indicators from

Procedure

Starting with the list of co-cited document pairs converted to cosine similarities for a given time window, we can specify a starting document as a seed, an initial similarity threshold and form a connected graph by traversing all links at or above the threshold. This creates a connected network where any node can be reached directly or indirectly starting with the seed document or any other node in the network.

We define the critical threshold for a cluster as the value of the similarity

Results

After all 113 seed documents were analyzed in the above manner, aggregate statistics were compiled including the distribution of critical levels, the sizes of clusters at the critical level, and the percentage increases which mark the critical transition. The most striking finding is that the critical thresholds for the sample distribute in an approximately normal fashion with a grouping around a mean cosine value of 0.18 (Fig. 6), with a slight right skew toward higher critical thresholds. The

Discussion and conclusions

We have seen that in most instances the giant component emerges independent of the starting seed document if the coefficient of similarity is sufficiently small. Thus, the giant component is an inherent feature of the network as a whole indicating that scientific specialties and disciplines are somehow connected. Of course, the nature of such inter-disciplinary connections is extremely interesting, and is addressed in another paper (Small, 1999b).

In a majority of cases, the largest percentage

References (13)

M. Tomassini et al.
Empirical analysis of the evolution of a scientific collaboration network
Physica A
(2007)
D. Achlioptas et al.
Explosive percolation in random networks
Science
(2009)
B. Bollobas et al.
The phase transition and connectedness in uniformly grown random graphs
Lecture Notes in Computer Science
(2004)
D.S. Callaway et al.
Are randomly grown graphs really random?
Physical Review E
(2001)
T.M.J. Fruchterman et al.
Graph drawing by force-directed placement
Software – Practice and Experience
(1991)
R. Guimera et al.
Team assembly mechanisms determine collaboration network structure and team performance
Science
(2005)

There are more references available in the full text version of this article.

Cited by (24)

Sustainability in the collaborative economy: A bibliometric analysis reveals emerging interest
2018, Journal of Cleaner Production
Citation Excerpt :
The threshold for citation and co-citation “can be influenced by formal considerations that support meaningful interpretation (Shaw, 1985). If the threshold is too small, we may obtain what Small (2009) calls a “giant component”, where almost every document is connected with others. While this kind of huge cluster demonstrates unicity between each document, we cannot observe and analyze the distinctiveness between them (which was our case with low thresholds).
The growing field of the collaborative economy is expanding geometrically and little retrospective work on this evolution has been made so far. A number of literature reviews have been focusing on specific business models of the collaborative economy deemed sustainable such as car-sharing, sharing, peer-to-peer business models, crowdsourcing, access-based consumption, community, or specific platforms (e.g. Uber, Airbnb), and some others with broader areas of focus. This paper presents a thorough bibliometric and network analysis combining both Scopus and Web of Science databases that provides fresh new insights into the evolution of the collaborative economy research field and its increasing coverage of sustainability-related topics. A first step identifies 729 published studies and uses bibliometrics to provide a description of the research field. A second stage involves networks analysis to identify influential authors, impactful publications, as well as established and emergent research clusters. A more thorough content analysis identifies key research topics, the attention granted to sustainability, interrelations, and collaboration patterns in the field. Data mapping techniques graphically depict the evolution of publications over time and identify areas of current research interests and potential directions for future research, namely in sustainability.
Research portfolio analysis and topic prominence
2017, Journal of Informetrics
Citation Excerpt :
If thresholds set were lower, topics very quickly coalesced into a giant component. Small (2009) observed similar behavior when linking individual documents into co-citation clusters. In contrast to co-citation clusters, topics identified using direct citation and time windows of at least 10 years, while they have a great deal of dynamic variation, are also very stable in that most topics last for many years with relatively low birth and death rates (Small et al., 2014).
Stakeholders in the science system need to decide where to place their bets. Example questions include: Which areas of research should get more funding? Who should we hire? Which projects should we abandon and which new projects should we start? Making informed choices requires knowledge about these research options. Unfortunately, to date research portfolio options have not been defined in a consistent, transparent and relevant manner. Furthermore, we don’t know how to define demand for these options. In this article, we address the issues of consistency, transparency, relevance and demand by using a model of science consisting of 91,726 topics (or research options) that contain over 58 million documents. We present a new indicator of topic prominence – a measure of visibility, momentum and, ultimately, demand. We assign over $203 billion of project-level funding data from STAR METRICS^® to individual topics in science, and show that the indicator of topic prominence, explains over one-third of the variance in current (or future) funding by topic. We also show that highly prominent topics receive far more funding per researcher than topics that are not prominent. Implications of these results for research planning and portfolio analysis by institutions and researchers are emphasized.
Mapping the field of cultural evolutionary theory and methods in archaeology using bibliometric methods
2023, Humanities and Social Sciences Communications
Artificial intelligence and HRM: identifying future research Agenda using systematic literature review and bibliometric analysis
2023, Management Review Quarterly
Research perspective of artificial intelligence and HRM: a bibliometric study
2023, International Journal of Business Innovation and Research
Entrepreneurial approach for open innovation: opening new opportunities, mapping knowledge and highlighting gaps
2022, International Journal of Entrepreneurial Behaviour and Research

View all citing articles on Scopus

View full text

Critical thresholds for co-citation clusters and emergence of the giant component

Abstract

Section snippets

Background

Data

Procedure

Results

Discussion and conclusions

Physica A

Explosive percolation in random networks

Science

The phase transition and connectedness in uniformly grown random graphs

Lecture Notes in Computer Science

Are randomly grown graphs really random?

Physical Review E

Graph drawing by force-directed placement

Software – Practice and Experience

Team assembly mechanisms determine collaboration network structure and team performance

Science