Abstract
Author name ambiguity is a crucial problem in any type of bibliometric analysis. It arises when several authors share the same name, but also when one author expresses their name in different ways. This article focuses on the former, also called the “namesake” problem. In particular, we assess the extent to which this compromises the Thomson Reuters Essential Science Indicators ranking of the top 1 % most cited authors worldwide. We show that three demographic characteristics that should be unrelated to research productivity—name origin, uniqueness of one’s family name, and the number of initials used in publishing—in fact have a very strong influence on it. In contrast to what could be expected from Web of Science publication data, researchers with Asian names—and in particular Chinese and Korean names—appear to be far more productive than researchers with Western names. Furthermore, for any country, academics with common names and fewer initials also appear to be more productive than their more uniquely named counterparts. However, this appearance of high productivity is caused purely by the fact that these “academic superstars” are in fact composites of many individual academics with the same name. We thus argue that it is high time that Thomson Reuters starts taking name disambiguation in general and non-Anglophone names in particular more seriously.
Notes
The increase in the number of papers published between 2005–2009 and 2010–2014 is 98 % for China, 57 % for Korea and only 17 % for the USA (Essential Science Indicators, May 2015).
Obviously this understates the extent of the occurrence of namesakes as there were many cases where the same name occurred only 2–4 times.
Names originating in other European countries did not occur in multiples of 5 or more.
Classifying papers by country of affiliation obviously does not provide an identical result compared with classifying authors’ names by the origin of their name as there are many Asian academics working at Western universities. However, the effects discussed in this paper are so large that it is unlikely that this limitation negates our results.
We investigated 43 % of the Chinese names, of which 85 % were conflated. Although conflation is less likely for the remaining, less common, Chinese names, these academics on average still produced significantly more papers than Anglo and European academics. Hence, we consider it very likely that many of them still contain multiple academics.
References
Harzing, A. W. (2001). Who’s in charge? An empirical study of executive staffing practices in foreign subsidiaries. Human Resource Management, 40(2), 139–158.
Harzing, A. W. (2013a). A preliminary test of Google Scholar as a source for citation data: a longitudinal study of Nobel prize winners. Scientometrics, 94(3), 1057–1075.
Harzing, A. W. (2013b). Document categories in the ISI Web of Knowledge: Misunderstanding the social sciences? Scientometrics, 94(1), 23–34.
Heeffer, S., Thijs, B., & Glänzel, W. (2013). Are registered authors more productive? ISSI Newsletter, 9(2), 29–32.
Qiu, J. (2008). Scientific publishing: Identity crisis. Nature News, 451(7180), 766–767.
Shin, D., Kim, T., Choi, J., & Kim, J. (2014). Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics, 100(1), 15–50.
Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology, 63(9), 1820–1833.
Wu, H., Li, B., Pei, Y., & He, J. (2014). Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics, 101(3), 1955–1972.
Zhu, J., Yang, Y., Xie, Q., Wang, L., & Hassan, S. U. (2014). Robust hybrid name disambiguation framework for large databases. Scientometrics, 98(3), 2255–2274.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Harzing, AW. Health warning: might contain multiple personalities—the problem of homonyms in Thomson Reuters Essential Science Indicators. Scientometrics 105, 2259–2270 (2015). https://doi.org/10.1007/s11192-015-1699-y
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-015-1699-y