Skip to main content
Log in

Institution name disambiguation for research assessment

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Research evaluation is a necessity for management of academic units (scientists, research groups, departments, institutes, universities) and for government decision making in science and technology. Yet, wrong conclusions may be drawn due to errors in assignments of authors to institutions. To improve existing techniques of institution name disambiguation (IND) based on word similarity or editing distance, a rule-based algorithm is proposed in this study. One-to-many relationships between an institution and many variant names under which it is referred to in bylines of publications are recognized with the aid of statistical methods and specific rules. The performance of the rule based IND algorithm is evaluated on large datasets in four fields. These experimental results demonstrate that the precision of the algorithm is high. Yet, recall should be improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Abramo, G., Cicero, T., & D’Angelo, C. A. (2011). A field-standardized application of DEA to national-scale research assessment of universities. Journal of Informetrics, 5(4), 618–628.

    Article  Google Scholar 

  • Alias-i. (2002). http://alias-i.com/lingpipe/web/about.html Accessed 13 May 2013.

  • Bollegala, D., Matsuo, Y., & Ishizuka, M. (2012). Automatic annotation of ambiguous personal names on the web. Computational Intelligence, 28(3), 398–425.

    Article  MathSciNet  Google Scholar 

  • Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. F. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.

    Article  Google Scholar 

  • Csajbók, E., Berhidi, A., Vasas, L., & Schubert, A. (2007). Hirsch-index for countries based on essential science indicators data. Scientometrics, 73(1), 91–117.

    Article  Google Scholar 

  • D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.

    Article  Google Scholar 

  • DeBruin, R. E., & Moed, H. F. (1990). The unification of addresses in scientific publications. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90. Selection of papers submitted for the 2nd International Conference on Bibliometrics, Scientometrics and Informetrics (pp. 65–78). Amsterdam: Elsevier.

    Google Scholar 

  • Egghe, L., & Rousseau, R. (1990). Introduction to informetrics. Quantitative methods in library, documentation and information science. Amsterdam: Elsevier.

    Google Scholar 

  • French, J. C., Powell, A. L., & Schulman, E. (2000). Using clustering strategies for creating authority files. Journal of the American Society for Information Science and Technology, 51(8), 774–786.

    Article  Google Scholar 

  • Galvez, C., & Moya-Anegón, F. (2006). The unification of institutional addresses applying parametrized finite-state graphs. Scientometrics, 69(2), 323–345.

    Article  Google Scholar 

  • Galvez, C., & Moya-Anegón, F. (2007). Standardizing formats of corporate source data. Scientometrics, 70(1), 3–26.

    Article  Google Scholar 

  • Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 241–272.

    Google Scholar 

  • Jiang, Y., Zheng, H. T., Wang, X., Lu, B., & Wu, K. (2011). Affiliation disambiguation for constructing semantic digital libraries. Journal of the American Society for Information Science and Technology, 62(6), 1029–1041.

    Article  Google Scholar 

  • Kim, S. W., & Cho, S. Y. (2013). Characteristics of Korean personal names. Journal of the American Society for Information Science and Technology, 64(1), 86–95.

    Article  Google Scholar 

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10, 707–710.

    MathSciNet  Google Scholar 

  • Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5), 1030–1047.

    Article  Google Scholar 

  • Morillo, F., Aparicio, J., González-Albo, B., & Moreno, L. (2013). Towards the automation of address identification. Scientometrics, 94(1), 207–224.

    Article  Google Scholar 

  • Narin, F., Stevens, K., Anderson, J., Collins, P., Irvine, J., Isard, P., et al. (1988). On-line approaches to measuring national scientific output: a cautionary tale. Science and Public Policy, 15(3), 153–163.

    Google Scholar 

  • Onodera, N., Iwasawa, M., Midorikawa, N., Yoshikane, F., Amano, K., Ootani, Y., et al. (2011). A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. Journal of the American Society for Information Science and Technology, 62(4), 677–690.

    Article  Google Scholar 

  • Pereira, D. A., Ribeiro-Neto, B., Ziviani, N., Laender, A. H. F., & Gonçalves, M. A. (2011). A generic web-based entity resolution framework. Journal of the American Society for Information Science and Technology, 62(5), 919–932.

    Article  Google Scholar 

  • Praal, F., Kosten, J., Calero-Medina, C., & Visser, M. S. (2013). Ranking universities: The challenge of affiliated institutes. Proceedings of the 18 th International Conference on Science and Technology Indicators. Sept. 4–6, 2013, Berlin, 284–289.

  • Richardson, G. (2010). Automated country name disambiguation for code set alignment. Proceedings of the 14 th European Conference on Research and advanced technology for digital libraries. Springer-Verlag Berlin, Heidelberg, 498–501.

  • Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.

    Article  Google Scholar 

  • Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis. Journal of the American Society for Information Science and Technology, 63(9), 1820–1833.

    Article  Google Scholar 

  • Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.

    Article  Google Scholar 

  • Taşkın, Z., & Al, U. (2013). Institutional name confusion on citation indexes: The example of the names of Turkish Hospitals. Procedia—Social and Behavioral Sciences, 73, 544–550.

    Article  Google Scholar 

  • Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158.

    Article  Google Scholar 

  • Van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.

    Article  Google Scholar 

  • Yang, K. H, Peng, H. T., & Jiang, J. Y. (2008). Author name disambiguation for citation using topic and web correlation. Proceedings of the 12 th Conference in the series of European Digital Library conferences (ECDL2008). Sept.19, 2008, Aarhus, 185–196.

Download references

Acknowledgments

We would like to thank Qiuru Peng, Hui Lin, Xueqin Jiang, and Zengli She from the college of information science and technology for their work on data verification. The authors are supported by Grant No. 13CTQ031 of the National Social Science Fund of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, S., Yang, B., Yan, S. et al. Institution name disambiguation for research assessment. Scientometrics 99, 823–838 (2014). https://doi.org/10.1007/s11192-013-1214-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-013-1214-2

Keywords

Navigation