Abstract
In this paper we use the information theoretic Infomap algorithm (Rosvall and Bergstrom in Proc Natl Acad Sci 105(4):1118–1123, 2008) iteratively in order to cluster the direct citation network of the Astro Data Set (publications in 59 astrophysical journals between 2003 and 2010.) We obtain 22 clusters of documents from the giant component of the network that we interpret as constituting ‘topics’ in the field of astrophysics. Upon investigation of the content of the topics we find a grouping of topics by shared features of their ‘journal signature’, that is the journals that are most characteristic for a topic due to their popularity and distinctiveness. These groups of topics match sub disciplines within the field. We generate a cognitive map of the field using a topic affinity network that shows what topics are disproportionally well connected (by citations) to other topics. The topology of the topic affinity network highlights a high-level organization of the field by sub-discipline and observational distance of the research object from Earth.



Similar content being viewed by others
Notes
Technically a similarity matrix can always be interpreted as an adjacency matrix that specifies for each pair of nodes in a network the strength of their connection. There exists a practical difference however between cases where the data model operationalizes the relationship between entities by measuring some direct interaction, e.g. a citation from one document to another, versus cases where the relationship between entities is operationalized as a similarity, e.g. the similarity in how two documents are citing or being cited by all other documents in the data set. In the former case, the network will typically be sparse, whereas in the latter case the network is typically dense and commonly some threshold is applied to suppress weak links between nodes to make calculations on the network easier.
Today, the hierarchical generalization of the map equation introduced in (Rosvall and Bergstrom 2011) makes it possible to cluster networks hierarchically with nested clusters in a principled way, see http://www.mapequation.org/code.html.
Since we designed our workflow, alternatives to completely disregarding the directionality have become available, such as to limit the number of steps and perform unrecorded teleportation that does not influence the clustering (Lambiotte and Rosvall 2012). According to Rosvall (private communication), using the flag—undirdir in the code provided at http://www.mapequation.org/code.html triggers a two-mode dynamics that assumes undirected links for calculating flows, but directed links when minimizing the code length. It has been used e.g. in (Mirshahvalad et al. 2012; West et al. 2016).
A more systematic and comprehensive method for determining variation in the form of a significance analysis is described in Rosvall and Bergstrom (2010).
One of the authors was trained in gravitational physics and worked several years as managing editor of the scientific review journal Living Reviews in Relativity.
References
Bastian, M., Heymann, S., Jacomy, M., et al. (2009). Gephi: An open source software for exploring and manipulating networks. ICWSM, 8, 361–362.
Batagelj, V., & Mrvar, A. (2003). Analysis and visualization of large networks. Graph drawing software (pp. 77–103). Berlin: Springer.
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404.
Cambrosio, A., Keating, P., & Mogoutov, A. (2004). Mapping collaborative work and innovation in biomedicine a computer-assisted analysis of antibody reagent workshops. Social Studies of Science, 34(3), 325–364.
Chen, C. (2006). Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.
Crane, D. (1972). Invisible colleges: Diffusion of knowledge in scientific communities. Chicago: The University of Chicago Press.
Ding, Y. (2011). Community detection: Topological vs. topical. Journal of Informetrics, 5(4), 498–514.
Dong, P., Loh, M., & Mondry, A. (2005). The “impact factor” revisited. Biomedical Digital Libraries, 2(7), 1–8.
Gläser, J. (2006). Wissenschaftliche Produktionsgemeinschaften: Die soziale Ordnung der Forschung, Campus Forschung (Vol. 906). Frankfurt/New York: Campus Verlag.
Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Towards a comparative approach to the identification of thematic structures in science. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2296-z.
Klavans, R., & Boyack, K.W. (2015). Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? ArXiv e-prints. arXiv:1511.05078.
Koopman, R., & Wang, S. (2017). Mutual information based labelling and comparing clusters. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2305-2.
Koopman, R., Wang, S., & Scharnhorst, A. (2017). Contextualization of topics: Browsing through the universe of bibliographic information. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2303-4.
Lambiotte, R., & Rosvall, M. (2012). Ranking and clustering of nodes in networks with smart teleportation. Physical Review E, 85(5), 056107.
Mirshahvalad, A., Lindholm, J., Derlen, M., & Rosvall, M. (2012). Significant communities in large sparse networks. PLoS ONE, 7(3), e33721.
Möller, U. (2005). Estimating the number of clusters from distributional results of partitioning a given data set. Adaptive and natural computing algorithms (pp. 151–154). New York: Springer.
Morris, S., & Van der Veer Martens, B. (2008). Mapping research specialties. Annual Review of Information Science and Technology, 42(1), 213–295.
Newman, M. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256.
Rosvall, M., Axelsson, D., & Bergstrom, C. T. (2010). The map equation. The European Physical Journal Special Topics, 178(1), 13–23.
Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118–1123.
Rosvall, M., & Bergstrom, C. (2009). Fast stochastic and recursive search algorithm. http://www.tp.umu.se/~rosvall/algorithm.pdf.
Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PLoS ONE, 5(1), e8694.
Rosvall, M., & Bergstrom, C. T. (2011). Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE, 6(4), e18209.
Rousseau, R. (1999). Temporal differences in self-citation rates of scientific journals. Scientometrics, 44(3), 521–531.
Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2009). Comparative study on methods of detecting research fronts using different types of citation. Journal of the American Society for Information Science and Technology, 60(3), 571–580.
Van Eck, N. J., & Waltman, L. (2010). Software survey: Vosviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.
Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2300-7.
Velden, T., Boyack, K., Gläer, J., Koopman, R., Scharnhorst, A., & Wang, S. (2017). Comparison of topic extraction approaches and their results. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2306-1.
Velden, T., Cambo, S., Ahmed, S., & Lagoze, C. (2013). Toward a time-sensitive mesoscopic analysis of co-author networks: A case study of two research specialties. In Proceedings of ISSI 2013 Vienna: 14th International Society of Scientometrics and Informetrics Conference.
Velden, T., Haque, A., & Lagoze, C. (2011) Resolving author name homonymy to improve resolution of structures in co-author networks. In JCDL’11. Ottawa.
Velden, T., Haque, A., & Lagoze, C. (2010). A new approach to analyzing patterns of collaboration in co-authorship networks: Mesoscopic analysis and interpretation. Scientometrics, 85(1), 219–242.
Velden, T., & Lagoze, C. (2013). The extraction of community structures from publication networks to support ethnographic observations of field differences in scientific communication. Journal of the American Society for Information Science and Technology, 64(12), 2405–2427.
Velden, T., Yan, S., Yu, K., & Lagoze, C. (2015). Mapping the evolution of scientific community structures in time. In Proceedings of the 24th international conference on World Wide Web (pp. 1039–1044). ACM.
West, J. D., Wesley-Smith, I., & Bergstrom, C. T. (2016). A recommendation system based on hierarchical clustering of an article-level citation network. IEEE Transactions on Big Data, 1, 1–1.
Zuccala, A. (2006). Author cocitation analysis is to intellectual structure as web colink analysis is to..? Journal of the American Society for Information Science and Technology, 57(11), 1486–1501.
Acknowledgements
We gratefully acknowledge funding from SMA 1258891 EAGER: Collaborative Research: Scientific Collaboration in Time, as well as a travel Grant by the intergovernmental framework for European Cooperation in Science and Technology (COST, Action: TD1210). We further thank Martin Rosvall for comments on pertinent new developments of the Infomap algorithm.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Velden, T., Yan, S. & Lagoze, C. Mapping the cognitive structure of astrophysics by infomap clustering of the citation network and topic affinity analysis. Scientometrics 111, 1033–1051 (2017). https://doi.org/10.1007/s11192-017-2299-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2299-9