The effects of measurement error in case of scientific network analysis

Erman, Nuša; Todorovski, Ljupčo

doi:10.1007/s11192-015-1615-5

The effects of measurement error in case of scientific network analysis

Published: 31 May 2015

Volume 104, pages 453–473, (2015)
Cite this article

Scientometrics Aims and scope Submit manuscript

Nuša Erman¹ &
Ljupčo Todorovski¹

556 Accesses
10 Citations
Explore all metrics

Abstract

Scientific network analysis takes at input large amounts of bibliographical data that are often incomplete. This leads to the introduction of different measurement errors in the scientific networks, which, in turn, influence the results of scientific networks analyses. Different authors have been studying the effects of measurement error on the results of network analysis, but these studies mostly rely on data gathered by survey questionnaires or on the study of incomplete data that are shown as random processes and emerge in unweighted undirected networks. This article aims at overcoming the limitations of these studies in three directions. First, we introduce measurement errors to network data following three most frequently present and well-known problems often present in bibliographic data: multiple authorship, homographs, and synonyms. Second, we apply missing data mechanisms to the identified incomplete data sources in order to link the latter with the probability of their occurrence. Third, we apply the incomplete data sources to different types of scientific networks and study the effects of measurement error in both, the weighted directed (i.e., citation) network and the weighted undirected (i.e., co-authorship) network. The results show that the most destructive incomplete data source is the problem of synonyms; it influences the accuracy and the robustness of the network structural measures the most. On the other hand, the multiple-authorship problem does not influence the results of network analysis at all.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Impact of Partially Missing Communities on the Reliability of Centrality Measures

Robustness of Centrality Measures Under Incomplete Data

On the uncertainty of interdisciplinarity measurements due to incomplete bibliographic data

Article Open access 09 February 2016

References

Algorithms and theory of computation handbook, CRC Press LLC (1999). “Levenshtein distance”. In P.E. Black (Ed.): Dictionary of algorithms and data structures (online). U.S. National Institute of Standards and Technology. 14 August 2008. http://www.nist.gov/dads/HTML/Levenshtein.html. Accessed 12 June 2013.
Borgatti, S. P., Carley, K. M., & Krackhardt, D. (2006). On the robustness of centrality measures under conditions of imperfect data. Social Networks, 28(2), 124–136.
Article Google Scholar
Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453, 98–101.
Article Google Scholar
Costenbader, E., & Valente, T. W. (2003). The stability of centrality measures when networks are sampled. Social Networks, 25(4), 283–307.
Article Google Scholar
de Nooy, W., Mrvar, A., & Batagelj, V. (2005). Exploratory social network analysis with Pajek. New York: Cambridge University Press.
Egghe, L., Rousseau, R., & van Hooydonk, G. (2000). Methods for accrediting publications to authors or countries: Consequences for evaluation studies. Journal of the American Society for Information Science, 51(2), 145–157.
Article Google Scholar
Everitt, B. (1974). Cluster analysis. London: Heinemann Educational Books Ltd.
Google Scholar
Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. PNAS, 106(52), 22073–22078.
Article Google Scholar
Kossinets, G. (2006). Effects of missing data in social networks. Social Networks, 28(3), 247–268.
Article Google Scholar
Lindsey, D. (1980). Production and citation measures in the sociology of science: The problem of multiple authorship. Social Studies in Science, 10(2), 145–162.
MacRoberts, M. H., & MacRoberts, B. R. (1989). Problems of citation analysis: A critical review. Journal of the American Society for Information Science, 40(5), 342–349.
Article Google Scholar
McKnight, P. E., McKnight, K. M., Sidani, S., & Figuerdo, A. J. (2007). Missing data: A gentle introduction. New York: The Guilford Press.
Google Scholar
Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks, 32(3), 245–251.
Pajek, (2014) Program for large network analysis (Version 3.13). http://pajek.imfm.si/doku.php?id=pajek. Accessed 15 Sep 2014.
Phelan, T. J. (1999). A compendium of issues for citation analysis. Scientometrics, 45(1), 117–136.
Article Google Scholar
Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719–745.
Article Google Scholar
R Core Team (2014). R: A language and environment for statistical computing. R foundation for statistical computing. vienna, Austria http://www.R-project.org/. Accessed 1 Feb 2014.
Rubin, D. B. (1976). Inference and missing data. Biometrica, 63(3), 581–592.
Article Google Scholar
Smith, L. C. (1981). Citation analysis. Library Trends, 20(1), 83–106.
Google Scholar
Wang, D. J., Shi, X., McFarland, D. A., & Leskovec, J. (2012). Measurement error in network data: A re-classification. Social Networks, 34(4), 396–409.
Article Google Scholar
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.

Download references

Acknowledgments

We thank the anonymous reviewer for providing helpful and constructive comments on an earlier version of the manuscript. We acknowledge the financial support of the Slovenian Research Agency through a Grant for training of young researchers and the Grant Number P5-0093 (B).

Author information

Authors and Affiliations

Faculty of Administration, University of Ljubljana, Gosarjeva 5, 1000, Ljubljana, Slovenia
Nuša Erman & Ljupčo Todorovski

Authors

Nuša Erman
View author publications
You can also search for this author in PubMed Google Scholar
Ljupčo Todorovski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuša Erman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Erman, N., Todorovski, L. The effects of measurement error in case of scientific network analysis. Scientometrics 104, 453–473 (2015). https://doi.org/10.1007/s11192-015-1615-5

Download citation

Received: 26 August 2014
Published: 31 May 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s11192-015-1615-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The effects of measurement error in case of scientific network analysis

Abstract

Access this article

Similar content being viewed by others

The Impact of Partially Missing Communities on the Reliability of Centrality Measures

Robustness of Centrality Measures Under Incomplete Data

On the uncertainty of interdisciplinarity measurements due to incomplete bibliographic data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The effects of measurement error in case of scientific network analysis

Abstract

Access this article

Similar content being viewed by others

The Impact of Partially Missing Communities on the Reliability of Centrality Measures

Robustness of Centrality Measures Under Incomplete Data

On the uncertainty of interdisciplinarity measurements due to incomplete bibliographic data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation