Critical thresholds for co-citation clusters and emergence of the giant component
Section snippets
Background
From a mathematical point of view, a co-citation network is a one-mode projection of a bipartite network consisting of two types of vertices, citing documents and cited documents, analogous to co-author networks consisting of author and document vertices (Newman et al., 2001). In forming a co-citation network we usually begin by selecting a sample of cited documents and then compute all the co-citations between the document pairs. A coefficient of similarity is defined on each pair, such as
Data
The datasets used in this study each consist of about 50,000 highly cited papers which constitute the top 1% of papers by citation count for individual years and in each of 22 broad fields covering a specific six-year rolling period. The same six-year period was used for both the cited and citing time windows, for example, 2003–2008. The source of data is the Web of Science®, and the dataset is identical to that used in the production of research fronts for the Essential Science Indicators from
Procedure
Starting with the list of co-cited document pairs converted to cosine similarities for a given time window, we can specify a starting document as a seed, an initial similarity threshold and form a connected graph by traversing all links at or above the threshold. This creates a connected network where any node can be reached directly or indirectly starting with the seed document or any other node in the network.
We define the critical threshold for a cluster as the value of the similarity
Results
After all 113 seed documents were analyzed in the above manner, aggregate statistics were compiled including the distribution of critical levels, the sizes of clusters at the critical level, and the percentage increases which mark the critical transition. The most striking finding is that the critical thresholds for the sample distribute in an approximately normal fashion with a grouping around a mean cosine value of 0.18 (Fig. 6), with a slight right skew toward higher critical thresholds. The
Discussion and conclusions
We have seen that in most instances the giant component emerges independent of the starting seed document if the coefficient of similarity is sufficiently small. Thus, the giant component is an inherent feature of the network as a whole indicating that scientific specialties and disciplines are somehow connected. Of course, the nature of such inter-disciplinary connections is extremely interesting, and is addressed in another paper (Small, 1999b).
In a majority of cases, the largest percentage
References (13)
- et al.
Empirical analysis of the evolution of a scientific collaboration network
Physica A
(2007) - et al.
Explosive percolation in random networks
Science
(2009) - et al.
The phase transition and connectedness in uniformly grown random graphs
Lecture Notes in Computer Science
(2004) - et al.
Are randomly grown graphs really random?
Physical Review E
(2001) - et al.
Graph drawing by force-directed placement
Software – Practice and Experience
(1991) - et al.
Team assembly mechanisms determine collaboration network structure and team performance
Science
(2005)
Cited by (24)
Sustainability in the collaborative economy: A bibliometric analysis reveals emerging interest
2018, Journal of Cleaner ProductionCitation Excerpt :The threshold for citation and co-citation “can be influenced by formal considerations that support meaningful interpretation (Shaw, 1985). If the threshold is too small, we may obtain what Small (2009) calls a “giant component”, where almost every document is connected with others. While this kind of huge cluster demonstrates unicity between each document, we cannot observe and analyze the distinctiveness between them (which was our case with low thresholds).
Research portfolio analysis and topic prominence
2017, Journal of InformetricsCitation Excerpt :If thresholds set were lower, topics very quickly coalesced into a giant component. Small (2009) observed similar behavior when linking individual documents into co-citation clusters. In contrast to co-citation clusters, topics identified using direct citation and time windows of at least 10 years, while they have a great deal of dynamic variation, are also very stable in that most topics last for many years with relatively low birth and death rates (Small et al., 2014).
Mapping the field of cultural evolutionary theory and methods in archaeology using bibliometric methods
2023, Humanities and Social Sciences CommunicationsArtificial intelligence and HRM: identifying future research Agenda using systematic literature review and bibliometric analysis
2023, Management Review QuarterlyResearch perspective of artificial intelligence and HRM: a bibliometric study
2023, International Journal of Business Innovation and ResearchEntrepreneurial approach for open innovation: opening new opportunities, mapping knowledge and highlighting gaps
2022, International Journal of Entrepreneurial Behaviour and Research