Skip to main content
Log in

The effects of measurement error in case of scientific network analysis

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Scientific network analysis takes at input large amounts of bibliographical data that are often incomplete. This leads to the introduction of different measurement errors in the scientific networks, which, in turn, influence the results of scientific networks analyses. Different authors have been studying the effects of measurement error on the results of network analysis, but these studies mostly rely on data gathered by survey questionnaires or on the study of incomplete data that are shown as random processes and emerge in unweighted undirected networks. This article aims at overcoming the limitations of these studies in three directions. First, we introduce measurement errors to network data following three most frequently present and well-known problems often present in bibliographic data: multiple authorship, homographs, and synonyms. Second, we apply missing data mechanisms to the identified incomplete data sources in order to link the latter with the probability of their occurrence. Third, we apply the incomplete data sources to different types of scientific networks and study the effects of measurement error in both, the weighted directed (i.e., citation) network and the weighted undirected (i.e., co-authorship) network. The results show that the most destructive incomplete data source is the problem of synonyms; it influences the accuracy and the robustness of the network structural measures the most. On the other hand, the multiple-authorship problem does not influence the results of network analysis at all.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Algorithms and theory of computation handbook, CRC Press LLC (1999). “Levenshtein distance”. In P.E. Black (Ed.): Dictionary of algorithms and data structures (online). U.S. National Institute of Standards and Technology. 14 August 2008. http://www.nist.gov/dads/HTML/Levenshtein.html. Accessed 12 June 2013.

  • Borgatti, S. P., Carley, K. M., & Krackhardt, D. (2006). On the robustness of centrality measures under conditions of imperfect data. Social Networks, 28(2), 124–136.

    Article  Google Scholar 

  • Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453, 98–101.

    Article  Google Scholar 

  • Costenbader, E., & Valente, T. W. (2003). The stability of centrality measures when networks are sampled. Social Networks, 25(4), 283–307.

    Article  Google Scholar 

  • de Nooy, W., Mrvar, A., & Batagelj, V. (2005). Exploratory social network analysis with Pajek. New York: Cambridge University Press.

  • Egghe, L., Rousseau, R., & van Hooydonk, G. (2000). Methods for accrediting publications to authors or countries: Consequences for evaluation studies. Journal of the American Society for Information Science, 51(2), 145–157.

    Article  Google Scholar 

  • Everitt, B. (1974). Cluster analysis. London: Heinemann Educational Books Ltd.

    Google Scholar 

  • Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. PNAS, 106(52), 22073–22078.

    Article  Google Scholar 

  • Kossinets, G. (2006). Effects of missing data in social networks. Social Networks, 28(3), 247–268.

    Article  Google Scholar 

  • Lindsey, D. (1980). Production and citation measures in the sociology of science: The problem of multiple authorship. Social Studies in Science, 10(2), 145–162.

  • MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342–349.

    Article  Google Scholar 

  • McKnight, P. E., McKnight, K. M., Sidani, S., & Figuerdo, A. J. (2007). Missing data: A gentle introduction. New York: The Guilford Press.

    Google Scholar 

  • Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.

  • Pajek, (2014) Program for large network analysis (Version 3.13). http://pajek.imfm.si/doku.php?id=pajek. Accessed 15 Sep 2014.

  • Phelan, T. J. (1999). A compendium of issues for citation analysis. Scientometrics, 45(1), 117–136.

    Article  Google Scholar 

  • Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719–745.

    Article  Google Scholar 

  • R Core Team (2014). R: A language and environment for statistical computing. R foundation for statistical computing. vienna, Austria http://www.R-project.org/. Accessed 1 Feb 2014.

  • Rubin, D. B. (1976). Inference and missing data. Biometrica, 63(3), 581–592.

    Article  Google Scholar 

  • Smith, L. C. (1981). Citation analysis. Library Trends, 20(1), 83–106.

    Google Scholar 

  • Wang, D. J., Shi, X., McFarland, D. A., & Leskovec, J. (2012). Measurement error in network data: A re-classification. Social Networks, 34(4), 396–409.

    Article  Google Scholar 

  • Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.

Download references

Acknowledgments

We thank the anonymous reviewer for providing helpful and constructive comments on an earlier version of the manuscript. We acknowledge the financial support of the Slovenian Research Agency through a Grant for training of young researchers and the Grant Number P5-0093 (B).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuša Erman.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Erman, N., Todorovski, L. The effects of measurement error in case of scientific network analysis. Scientometrics 104, 453–473 (2015). https://doi.org/10.1007/s11192-015-1615-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1615-5

Keywords

Navigation