Skip to main content
Log in

Health warning: might contain multiple personalities—the problem of homonyms in Thomson Reuters Essential Science Indicators

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Author name ambiguity is a crucial problem in any type of bibliometric analysis. It arises when several authors share the same name, but also when one author expresses their name in different ways. This article focuses on the former, also called the “namesake” problem. In particular, we assess the extent to which this compromises the Thomson Reuters Essential Science Indicators ranking of the top 1 % most cited authors worldwide. We show that three demographic characteristics that should be unrelated to research productivity—name origin, uniqueness of one’s family name, and the number of initials used in publishing—in fact have a very strong influence on it. In contrast to what could be expected from Web of Science publication data, researchers with Asian names—and in particular Chinese and Korean names—appear to be far more productive than researchers with Western names. Furthermore, for any country, academics with common names and fewer initials also appear to be more productive than their more uniquely named counterparts. However, this appearance of high productivity is caused purely by the fact that these “academic superstars” are in fact composites of many individual academics with the same name. We thus argue that it is high time that Thomson Reuters starts taking name disambiguation in general and non-Anglophone names in particular more seriously.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

  1. The increase in the number of papers published between 2005–2009 and 2010–2014 is 98 % for China, 57 % for Korea and only 17 % for the USA (Essential Science Indicators, May 2015).

  2. Obviously this understates the extent of the occurrence of namesakes as there were many cases where the same name occurred only 2–4 times.

  3. Names originating in other European countries did not occur in multiples of 5 or more.

  4. Classifying papers by country of affiliation obviously does not provide an identical result compared with classifying authors’ names by the origin of their name as there are many Asian academics working at Western universities. However, the effects discussed in this paper are so large that it is unlikely that this limitation negates our results.

  5. We investigated 43 % of the Chinese names, of which 85 % were conflated. Although conflation is less likely for the remaining, less common, Chinese names, these academics on average still produced significantly more papers than Anglo and European academics. Hence, we consider it very likely that many of them still contain multiple academics.

References

  • Harzing, A. W. (2001). Who’s in charge? An empirical study of executive staffing practices in foreign subsidiaries. Human Resource Management, 40(2), 139–158.

    Article  Google Scholar 

  • Harzing, A. W. (2013a). A preliminary test of Google Scholar as a source for citation data: a longitudinal study of Nobel prize winners. Scientometrics, 94(3), 1057–1075.

    Article  Google Scholar 

  • Harzing, A. W. (2013b). Document categories in the ISI Web of Knowledge: Misunderstanding the social sciences? Scientometrics, 94(1), 23–34.

    Article  Google Scholar 

  • Heeffer, S., Thijs, B., & Glänzel, W. (2013). Are registered authors more productive? ISSI Newsletter, 9(2), 29–32.

    Google Scholar 

  • Qiu, J. (2008). Scientific publishing: Identity crisis. Nature News, 451(7180), 766–767.

    Article  Google Scholar 

  • Shin, D., Kim, T., Choi, J., & Kim, J. (2014). Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics, 100(1), 15–50.

    Article  Google Scholar 

  • Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis? Journal of the American Society for Information Science and Technology, 63(9), 1820–1833.

    Article  Google Scholar 

  • Wu, H., Li, B., Pei, Y., & He, J. (2014). Unsupervised author disambiguation using Dempster–Shafer theory. Scientometrics, 101(3), 1955–1972.

    Article  Google Scholar 

  • Zhu, J., Yang, Y., Xie, Q., Wang, L., & Hassan, S. U. (2014). Robust hybrid name disambiguation framework for large databases. Scientometrics, 98(3), 2255–2274.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne-Wil Harzing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harzing, AW. Health warning: might contain multiple personalities—the problem of homonyms in Thomson Reuters Essential Science Indicators. Scientometrics 105, 2259–2270 (2015). https://doi.org/10.1007/s11192-015-1699-y

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1699-y

Keywords

Navigation