Subject clustering analysis based on ISI category classification
Introduction
A series of previous studies focused on the analyses of journal clustering based on a complete journal–journal cross-citation matrix (Janssens et al., 2009, Zhang et al., 2009a, Zhang et al., 2009b). ISI, now part of Thomson Scientific, has assigned each journal included to one or more subject categories. Based on this classification scheme, the journal–journal matrix can be aggregated to a category–category matrix, which is much more densely populated than that on the journal level. The present study will focus on the analysis of the information flow among the ISI subject categories. This will be done for two important reasons. This exercise aims at finding an appropriate field structure of the Web of Science using the subject clustering algorithm developed in previous studies. Furthermore, since ISI subject categories are based on journal assignment the question arises of what changes if journal cross-citation is replaced by subject cross-citation. If changes are not essential, the elaborate clustering of more than 8000 journals could be substituted by a somewhat easier analysis of roughly 250 ISI categories and the journal level could, as it were, be skipped. However, we stress that cross-citations are calculated from individual paper-to-paper links whatever aggregation levels are chosen. The other reason is to analyze whether multiple journal assignment to subject categories interferes with, distorts or even determines the resulting cluster structure. Before we introduce the methodological rudiments, we briefly summarise the historical background and the outcomes of previous or related studies.
Along with the development of computerised scientometrics, mapping of science plays an important role in the construction and analysis of science structure. For instance, a variety of techniques for analyzing journal–journal citation relationships have been reported in the literature (Doreian and Fararo, 1985, Leydesdorff, 2006, Tijssen et al., 1987). An alternative method of co-citation clustering has been investigated in constructing a World Atlas of Sciences for ISI (Garfield et al., 1975, Leydesdorff, 1987, Small, 1999). Boyack, Klavans, and Börner (2005) applied eight alternative measures of journal similarity to a dataset of 7121 journals covering over one million documents in the combined Science Citation and Social Sciences Citation Indexes, to show a global map of science using the force-directed graph layout tool VxOrd. Chen (2008) proposes an approach to classify scientific networks in terms of aggregated journal-journal citation relations of the ISI Journal Citation Reports using the affinity propagation method. As mentioned in the outset, Zhang, Glänzel, et al. (2009) and Zhang, Janssens, et al. (2009) have also investigated different methods for the analysis and classifications of scientific journals. Besides using journals as the units of analysis, some recent investigations focus on the science structure based on the subject categories. Glänzel and Schubert (2003) designed a new classification scheme of science fields and subfields for scientometric evaluation purposes. Moya-Anegon et al. (2004) proposed a new technique that uses thematic classification as entities of co-citation, and presented an ego-centred network of 222 ISI categories including science and social sciences. Leydesdorff and Rafols (2009) classified the ISI 172 science categories into 14 groups based on factor analysis, and compared the interdisciplinarity of each category using betweenness centrality. Compared to other researchers, we applied a new clustering technique to classify the ISI science and social sciences categories into 7 groups based on the category–category cross-citation similarities, and further compared the results with the 7 hybrid clustering solution of 8305 journals in a previous study (Zhang, Janssens, et al., 2009). Furthermore, several indicators have been used to analyze the communication characteristics of different categories.
Section snippets
Data sources and processing
The data have been collected from the Web of Science of Thomson-Reuters. Altogether 9487 journals which were assigned to the 246 categories of sciences, social sciences and arts and humanities in the entire period of 2002–2006 were selected and only three document types, namely, article, letter and review, were taken into consideration. More than six million papers were indexed and citations have been summed up through a variable citation window, from the publication year till 2006.
Methods
As already mentioned at the outset, citation links are determined on the basis of paper-by-paper assignment, which provides us several advantages compared to other approaches (Zhang, Glänzel, et al., 2009). There are three procedures for the cross-citation data aggregations: document-to-document, then journal-to-journal, and finally ISI category-to-category (or large domains-to-domains). The previous work focussed on the journal level, and now our focus turns to the higher level of ISI subject
Results
The number of subject assignment in the Web of Science (SCIE, SSCI, AHCI) is 14,608 for 9487 journals during 2002–2006, namely, roughly 1.54 categories per journal. The average number of journals for each category is 59.4. Fig. 1 presents the 15 biggest ISI subject categories, each of which has more than 150 journals.
Among the 9487 journals under study, roughly 60% journals have single assignment for categories in ISI subject classification, and others have multiple assignments. The most
Conclusions and discussions
The Multi-level Aggregation Method generates a balanced clustering result of the ISI subject categories. The components of these clusters clearly distinguish from each other, and each cluster represents one of the scientific domains, respectively. Several indicators have been used to compare the communication characteristics among different ISI subject categories. In general, social sciences categories are inclined to enlarge their link distributions, while science categories tend to have deep
Acknowledgement
The research was supported by Centre for R&D Monitoring of the Flemish Government, the National Natural Science Foundation of China (grant no. 70673019), China Scholarship Council and ERCMAMT (Engineering Research Centre of Metallurgical Automation and Measurement Technology, Ministry of Education, Hubei, China).
References (18)
- et al.
Hybrid clustering for validation and improvement of subject-classification schemes
Information Processing & Management
(2009) - et al.
Pajek–analysis and visualization of large networks
Graph Drawing
(2002) - et al.
Fast unfolding of communities in large networks
Journal of Statistical Mechanics: Theory and Experiment
(2008) - et al.
Mapping the backbone of science
Scientometrics
(2005) Classification of scientific networks using aggregated journal–journal citation relations in the journal citation reports
Journal of the American Society for Information Science and Technology
(2008)- et al.
Structural equivalence in a journal network
Journal of the American Society for Information Science
(1985) - et al.
A system for automatic classification of scientific literature
Journal of the Indian Institute of Science
(1975) - et al.
A new classification scheme of science fields and subfields designed for scientometric evaluation purposes
Scientometrics
(2003) Various methods for the mapping of science
Scientometrics
(1987)
Cited by (60)
Stochastic block model reveals maps of citation patterns and their evolution in time
2018, Journal of InformetricsCitation Excerpt :We use journal citation networks from Thomson Reuters Citation Index® for the years ranging from 1900 to 2013 which contains hundreds of millions of citations. Many studies concentrated on small subsets of the citation network (An, Janssen, & Milios, 2004; Grossman, 2002; Nerur et al., 2005; Pieters, Baumgartner, Vermunt, & Bijmolt, 1999; Porter & Rafols, 2009; Shibata et al., 2011; Zhang et al., 2010), while others were interested in large-scale patterns (Boyack et al., 2005; de Moya-Anegón et al., 2007; Leydesdorff & Rafols, 2009). We focus on the large scale citation networks that are constructed using all articles in this bibliographic dataset.
Granularity of algorithmically constructed publication-level classifications of research publications: Identification of topics
2018, Journal of InformetricsCitation Excerpt :More advanced approaches have been proposed for journal classification in recent decades. These approaches use citation relations between journals for their classification (Archambault, Caruso, & Beauchesne, 2011; Boyack, Klavans, & Börner, 2005; Chen, 2008; Doreian, 1988; Leydesdorff, 1987, 2006; Leydesdorff, Bornmann, & Wagner, 2017; Pudovkin & Garfield, 2002; Rosvall & Bergstrom, 2011; Small & Koenig, 1977; Zhang, Liu, Janssens, Liang, & Glänzel, 2010). The many limits of journal-level classification have been acknowledged in the literature (Archambault et al., 2011).
Mapping science using Library of Congress Subject Headings
2017, Journal of InformetricsOpCitance: Citation contexts identified from the PubMed Central open access articles
2023, Scientific Data