Elsevier

Journal of Informetrics

Volume 3, Issue 4, October 2009, Pages 332-340
Journal of Informetrics

Critical thresholds for co-citation clusters and emergence of the giant component

https://doi.org/10.1016/j.joi.2009.05.001Get rights and content

Abstract

The behavior of co-citation clusters is studied over a wide range of similarity values, and we demonstrate the existence of critical or percolation transitions marked by a sudden expansion of cluster size with a small decrease in similarity which, in most cases, reflects the emergence of a giant component on the overall graph for the dataset. The study was motivated by the question of how to set appropriate thresholds for delineating individual research areas that identify, as far as possible, natural boundaries, in view of the fact that a threshold or criterion appropriate for one area may not be appropriate for another. We explore the rate of change in cluster size as a possible boundary indicator. The relationship of this critical behavior to maps of science is discussed.

Section snippets

Background

From a mathematical point of view, a co-citation network is a one-mode projection of a bipartite network consisting of two types of vertices, citing documents and cited documents, analogous to co-author networks consisting of author and document vertices (Newman et al., 2001). In forming a co-citation network we usually begin by selecting a sample of cited documents and then compute all the co-citations between the document pairs. A coefficient of similarity is defined on each pair, such as

Data

The datasets used in this study each consist of about 50,000 highly cited papers which constitute the top 1% of papers by citation count for individual years and in each of 22 broad fields covering a specific six-year rolling period. The same six-year period was used for both the cited and citing time windows, for example, 2003–2008. The source of data is the Web of Science®, and the dataset is identical to that used in the production of research fronts for the Essential Science Indicators from

Procedure

Starting with the list of co-cited document pairs converted to cosine similarities for a given time window, we can specify a starting document as a seed, an initial similarity threshold and form a connected graph by traversing all links at or above the threshold. This creates a connected network where any node can be reached directly or indirectly starting with the seed document or any other node in the network.

We define the critical threshold for a cluster as the value of the similarity

Results

After all 113 seed documents were analyzed in the above manner, aggregate statistics were compiled including the distribution of critical levels, the sizes of clusters at the critical level, and the percentage increases which mark the critical transition. The most striking finding is that the critical thresholds for the sample distribute in an approximately normal fashion with a grouping around a mean cosine value of 0.18 (Fig. 6), with a slight right skew toward higher critical thresholds. The

Discussion and conclusions

We have seen that in most instances the giant component emerges independent of the starting seed document if the coefficient of similarity is sufficiently small. Thus, the giant component is an inherent feature of the network as a whole indicating that scientific specialties and disciplines are somehow connected. Of course, the nature of such inter-disciplinary connections is extremely interesting, and is addressed in another paper (Small, 1999b).

In a majority of cases, the largest percentage

References (13)

  • M. Tomassini et al.

    Empirical analysis of the evolution of a scientific collaboration network

    Physica A

    (2007)
  • D. Achlioptas et al.

    Explosive percolation in random networks

    Science

    (2009)
  • B. Bollobas et al.

    The phase transition and connectedness in uniformly grown random graphs

    Lecture Notes in Computer Science

    (2004)
  • D.S. Callaway et al.

    Are randomly grown graphs really random?

    Physical Review E

    (2001)
  • T.M.J. Fruchterman et al.

    Graph drawing by force-directed placement

    Software – Practice and Experience

    (1991)
  • R. Guimera et al.

    Team assembly mechanisms determine collaboration network structure and team performance

    Science

    (2005)
There are more references available in the full text version of this article.

Cited by (24)

  • Sustainability in the collaborative economy: A bibliometric analysis reveals emerging interest

    2018, Journal of Cleaner Production
    Citation Excerpt :

    The threshold for citation and co-citation “can be influenced by formal considerations that support meaningful interpretation (Shaw, 1985). If the threshold is too small, we may obtain what Small (2009) calls a “giant component”, where almost every document is connected with others. While this kind of huge cluster demonstrates unicity between each document, we cannot observe and analyze the distinctiveness between them (which was our case with low thresholds).

  • Research portfolio analysis and topic prominence

    2017, Journal of Informetrics
    Citation Excerpt :

    If thresholds set were lower, topics very quickly coalesced into a giant component. Small (2009) observed similar behavior when linking individual documents into co-citation clusters. In contrast to co-citation clusters, topics identified using direct citation and time windows of at least 10 years, while they have a great deal of dynamic variation, are also very stable in that most topics last for many years with relatively low birth and death rates (Small et al., 2014).

  • Research perspective of artificial intelligence and HRM: a bibliometric study

    2023, International Journal of Business Innovation and Research
View all citing articles on Scopus
View full text