Skip to main content
Log in

Investigating the effect of global data on topic detection

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

A dataset containing 111,616 documents in astronomy and astrophysics (Astro-set) has been created and is being partitioned by several research groups using different algorithms. For this paper, rather than partitioning the dataset directly, we locate the data in a previously created model of the full Scopus database. This allows comparisons between using local and global data for community detection, which is done in an accompanying paper. We can begin to answer the question of the extent to which the rest of a large database (a global solution) affects the partitioning of a smaller journal-based set of documents (a local solution). We find that the Astro-set, while spread across hundreds of partitions in the Scopus map, is concentrated in only a few regions of the map. From this perspective there seems to be some correspondence between local information and the global cluster solution. However, we also show that the within-Astro-set links are only one-third of the total links that are available to these papers in the full Scopus database. The non-Astro-set links are significant in two ways: (1) in areas where the Astro-set papers are concentrated, related papers from non-astronomy journals are included in clusters with the Astro-set papers, and (2) Astro-set papers that have a very low fraction of within-set links tend to end up in clusters that are not astronomy-based. Overall, this work highlights limitations of the use of journal-based document sets to identify the structure of scientific fields.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Archambault, E., Caruso, J., & Beauchesne, O. (2011). Towards a multilingual, comprehensive and open scientific journal ontology. In Proceedings of the 13th International Conference of the International Society for Scientometrics and Informetrics, pp. 66–77.

  • Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 10, P10008.

    Article  Google Scholar 

  • Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R. P., et al. (2012). Design and update of a classification system: The UCSD map of science. PLoS ONE, 7(7), e39464.

    Article  Google Scholar 

  • Boyack, K. W., & Klavans, R. (2014a). Creation of a highly detailed, dynamic, global model and map of science. Journal of the Association for Information Science and Technology, 65(4), 670–685.

    Article  Google Scholar 

  • Boyack, K. W., & Klavans, R. (2014b). Including non-source items in a large-scale map of science: What difference does it make? Journal of Informetrics, 8, 569–580.

    Article  Google Scholar 

  • Boyack, K. W., Klavans, R., Small, H., & Ungar, L. (2014). Characterizing the emergence of two nanotechnology topics using a contemporaneous global micro-model of science. Journal of Engineering and Technology Management, 32, 147–159.

    Article  Google Scholar 

  • Emmons, S., Kobourov, S., Gallant, M., & Börner, K. (2016). Analysis of network clustering algorithms and cluster quality metrics at scale. PLoS ONE, 11(7), e0159161.

    Article  Google Scholar 

  • Klavans, R., & Boyack, K. W. (2011). Using global mapping to create more accurate document-level maps of research fields. Journal of the American Society for Information Science and Technology, 62(1), 1–18.

    Article  Google Scholar 

  • Klavans, R., & Boyack, K. W. (2016). Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? Journal of the Association for Information Science and Technology. doi:10.1002/asi.23734.

  • Martin, S., Brown, W. M., Klavans, R., & Boyack, K. W. (2011). OpenOrd: An open-source toolbox for large graph layout. Proceedings of SPIE—The International Society for Optical Engineering, 7868, 786806.

    Google Scholar 

  • Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69, 026113.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. T. (2008). Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences of the USA, 105(4), 1118–1123.

    Article  Google Scholar 

  • Schubert, A. (2013). Measuring the similarity between the reference and citation distributions of journals. Scientometrics, 96, 305–313.

    Article  Google Scholar 

  • Small, H., Boyack, K. W., & Klavans, R. (2014). Identifying emerging topics in science and technology. Research Policy, 43, 1450–1467.

    Article  Google Scholar 

  • Sparck Jones, K., Walker, S., & Robertson, S. E. (2000). A probabilistic model of information retrieval: Development and comparative experiments. Part 1. Information Processing and Management, 36(6), 779–808.

    Article  Google Scholar 

  • Subelj, L., Van Eck, N. J., & Waltman, L. (2016). Clustering scientific publications based on citation relations: A systematic comparison of different methods. PLoS ONE, 11(4), e154404.

    Article  Google Scholar 

  • Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of publications using CitNetExplorer and VOSviewer. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2300-7.

  • Velden, T., Boyack, K. W., Gläser, J., Koopman, R., Scharnhorst, A., & Wang, S. (2017). Comparison of topic extraction approaches and their results. In J. Gläser, A. Scharnhorst & W. Glänzel (Eds.), Same data—different results? Towards a comparative approach to the identification of thematic structures in science. Special Issue of Scientometrics. doi:10.1007/s11192-017-2306-1.

  • Wallace, M. L., Lariviere, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practices. PLoS ONE, 7(3), e33339.

    Article  Google Scholar 

  • Waltman, L., & van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology, 63(12), 2378–2392.

    Article  Google Scholar 

  • Waltman, L., & van Eck, N. J. (2013). A smart local moving algorithm for large-scale modularity-based community detection. European Physical Journal B, 86, 471.

    Article  Google Scholar 

Download references

Acknowledgements

This paper benefitted greatly from reviews by Theresa Velden, Andrea Scharnhorst, and two anonymous referees.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin W. Boyack.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boyack, K.W. Investigating the effect of global data on topic detection. Scientometrics 111, 999–1015 (2017). https://doi.org/10.1007/s11192-017-2297-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2297-y

Keywords

Navigation