Skip to main content
Log in

Mapping the cognitive structure of astrophysics by infomap clustering of the citation network and topic affinity analysis

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this paper we use the information theoretic Infomap algorithm (Rosvall and Bergstrom in Proc Natl Acad Sci 105(4):1118–1123, 2008) iteratively in order to cluster the direct citation network of the Astro Data Set (publications in 59 astrophysical journals between 2003 and 2010.) We obtain 22 clusters of documents from the giant component of the network that we interpret as constituting ‘topics’ in the field of astrophysics. Upon investigation of the content of the topics we find a grouping of topics by shared features of their ‘journal signature’, that is the journals that are most characteristic for a topic due to their popularity and distinctiveness. These groups of topics match sub disciplines within the field. We generate a cognitive map of the field using a topic affinity network that shows what topics are disproportionally well connected (by citations) to other topics. The topology of the topic affinity network highlights a high-level organization of the field by sub-discipline and observational distance of the research object from Earth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Technically a similarity matrix can always be interpreted as an adjacency matrix that specifies for each pair of nodes in a network the strength of their connection. There exists a practical difference however between cases where the data model operationalizes the relationship between entities by measuring some direct interaction, e.g. a citation from one document to another, versus cases where the relationship between entities is operationalized as a similarity, e.g. the similarity in how two documents are citing or being cited by all other documents in the data set. In the former case, the network will typically be sparse, whereas in the latter case the network is typically dense and commonly some threshold is applied to suppress weak links between nodes to make calculations on the network easier.

  2. Today, the hierarchical generalization of the map equation introduced in (Rosvall and Bergstrom 2011) makes it possible to cluster networks hierarchically with nested clusters in a principled way, see http://www.mapequation.org/code.html.

  3. Since we designed our workflow, alternatives to completely disregarding the directionality have become available, such as to limit the number of steps and perform unrecorded teleportation that does not influence the clustering (Lambiotte and Rosvall 2012). According to Rosvall (private communication), using the flag—undirdir in the code provided at http://www.mapequation.org/code.html triggers a two-mode dynamics that assumes undirected links for calculating flows, but directed links when minimizing the code length. It has been used e.g. in (Mirshahvalad et al. 2012; West et al. 2016).

  4. A more systematic and comprehensive method for determining variation in the form of a significance analysis is described in Rosvall and Bergstrom (2010).

  5. One of the authors was trained in gravitational physics and worked several years as managing editor of the scientific review journal Living Reviews in Relativity.

References

  • Bastian, M., Heymann, S., Jacomy, M., et al. (2009). Gephi: An open source software for exploring and manipulating networks. ICWSM, 8, 361–362.

    Google Scholar 

  • Batagelj, V., & Mrvar, A. (2003). Analysis and visualization of large networks. Graph drawing software (pp. 77–103). Berlin: Springer.

    Google Scholar 

  • Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404.

    Article  Google Scholar 

  • Cambrosio, A., Keating, P., & Mogoutov, A. (2004). Mapping collaborative work and innovation in biomedicine a computer-assisted analysis of antibody reagent workshops. Social Studies of Science, 34(3), 325–364.

    Article  Google Scholar 

  • Chen, C. (2006). Citespace ii: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.

    Article  Google Scholar 

  • Crane, D. (1972). Invisible colleges: Diffusion of knowledge in scientific communities. Chicago: The University of Chicago Press.

    Google Scholar 

  • Ding, Y. (2011). Community detection: Topological vs. topical. Journal of Informetrics, 5(4), 498–514.

    Article  Google Scholar 

  • Dong, P., Loh, M., & Mondry, A. (2005). The “impact factor” revisited. Biomedical Digital Libraries, 2(7), 1–8.

    Google Scholar 

  • Gläser, J. (2006). Wissenschaftliche Produktionsgemeinschaften: Die soziale Ordnung der Forschung, Campus Forschung (Vol. 906). Frankfurt/New York: Campus Verlag.

    Google Scholar 

  • Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Towards a comparative approach to the identification of thematic structures in science. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2296-z.

  • Klavans, R., & Boyack, K.W. (2015). Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? ArXiv e-prints. arXiv:1511.05078.

  • Koopman, R., & Wang, S. (2017). Mutual information based labelling and comparing clusters. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2305-2.

  • Koopman, R., Wang, S., & Scharnhorst, A. (2017). Contextualization of topics: Browsing through the universe of bibliographic information. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2303-4.

  • Lambiotte, R., & Rosvall, M. (2012). Ranking and clustering of nodes in networks with smart teleportation. Physical Review E, 85(5), 056107.

    Article  Google Scholar 

  • Mirshahvalad, A., Lindholm, J., Derlen, M., & Rosvall, M. (2012). Significant communities in large sparse networks. PLoS ONE, 7(3), e33721.

    Article  Google Scholar 

  • Möller, U. (2005). Estimating the number of clusters from distributional results of partitioning a given data set. Adaptive and natural computing algorithms (pp. 151–154). New York: Springer.

    Chapter  Google Scholar 

  • Morris, S., & Van der Veer Martens, B. (2008). Mapping research specialties. Annual Review of Information Science and Technology, 42(1), 213–295.

    Article  Google Scholar 

  • Newman, M. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256.

    Article  MathSciNet  MATH  Google Scholar 

  • Rosvall, M., Axelsson, D., & Bergstrom, C. T. (2010). The map equation. The European Physical Journal Special Topics, 178(1), 13–23.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4), 1118–1123.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. (2009). Fast stochastic and recursive search algorithm. http://www.tp.umu.se/~rosvall/algorithm.pdf.

  • Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PLoS ONE, 5(1), e8694.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. T. (2011). Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS ONE, 6(4), e18209.

    Article  Google Scholar 

  • Rousseau, R. (1999). Temporal differences in self-citation rates of scientific journals. Scientometrics, 44(3), 521–531.

    Article  Google Scholar 

  • Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2009). Comparative study on methods of detecting research fronts using different types of citation. Journal of the American Society for Information Science and Technology, 60(3), 571–580.

    Article  Google Scholar 

  • Van Eck, N. J., & Waltman, L. (2010). Software survey: Vosviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.

    Article  Google Scholar 

  • Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2300-7.

  • Velden, T., Boyack, K., Gläer, J., Koopman, R., Scharnhorst, A., & Wang, S. (2017). Comparison of topic extraction approaches and their results. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data: Different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2306-1.

  • Velden, T., Cambo, S., Ahmed, S., & Lagoze, C. (2013). Toward a time-sensitive mesoscopic analysis of co-author networks: A case study of two research specialties. In Proceedings of ISSI 2013 Vienna: 14th International Society of Scientometrics and Informetrics Conference.

  • Velden, T., Haque, A., & Lagoze, C. (2011) Resolving author name homonymy to improve resolution of structures in co-author networks. In JCDL’11. Ottawa.

  • Velden, T., Haque, A., & Lagoze, C. (2010). A new approach to analyzing patterns of collaboration in co-authorship networks: Mesoscopic analysis and interpretation. Scientometrics, 85(1), 219–242.

    Article  Google Scholar 

  • Velden, T., & Lagoze, C. (2013). The extraction of community structures from publication networks to support ethnographic observations of field differences in scientific communication. Journal of the American Society for Information Science and Technology, 64(12), 2405–2427.

    Article  Google Scholar 

  • Velden, T., Yan, S., Yu, K., & Lagoze, C. (2015). Mapping the evolution of scientific community structures in time. In Proceedings of the 24th international conference on World Wide Web (pp. 1039–1044). ACM.

  • West, J. D., Wesley-Smith, I., & Bergstrom, C. T. (2016). A recommendation system based on hierarchical clustering of an article-level citation network. IEEE Transactions on Big Data, 1, 1–1.

    Google Scholar 

  • Zuccala, A. (2006). Author cocitation analysis is to intellectual structure as web colink analysis is to..? Journal of the American Society for Information Science and Technology, 57(11), 1486–1501.

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge funding from SMA 1258891 EAGER: Collaborative Research: Scientific Collaboration in Time, as well as a travel Grant by the intergovernmental framework for European Cooperation in Science and Technology (COST, Action: TD1210). We further thank Martin Rosvall for comments on pertinent new developments of the Infomap algorithm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theresa Velden.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Velden, T., Yan, S. & Lagoze, C. Mapping the cognitive structure of astrophysics by infomap clustering of the citation network and topic affinity analysis. Scientometrics 111, 1033–1051 (2017). https://doi.org/10.1007/s11192-017-2299-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2299-9

Keywords

Navigation