Abstract
Research evaluation is a necessity for management of academic units (scientists, research groups, departments, institutes, universities) and for government decision making in science and technology. Yet, wrong conclusions may be drawn due to errors in assignments of authors to institutions. To improve existing techniques of institution name disambiguation (IND) based on word similarity or editing distance, a rule-based algorithm is proposed in this study. One-to-many relationships between an institution and many variant names under which it is referred to in bylines of publications are recognized with the aid of statistical methods and specific rules. The performance of the rule based IND algorithm is evaluated on large datasets in four fields. These experimental results demonstrate that the precision of the algorithm is high. Yet, recall should be improved.

Similar content being viewed by others
References
Abramo, G., Cicero, T., & D’Angelo, C. A. (2011). A field-standardized application of DEA to national-scale research assessment of universities. Journal of Informetrics, 5(4), 618–628.
Alias-i. (2002). http://alias-i.com/lingpipe/web/about.html Accessed 13 May 2013.
Bollegala, D., Matsuo, Y., & Ishizuka, M. (2012). Automatic annotation of ambiguous personal names on the web. Computational Intelligence, 28(3), 398–425.
Cota, R. G., Ferreira, A. A., Nascimento, C., Gonçalves, M. A., & Laender, A. H. F. (2010). An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. Journal of the American Society for Information Science and Technology, 61(9), 1853–1870.
Csajbók, E., Berhidi, A., Vasas, L., & Schubert, A. (2007). Hirsch-index for countries based on essential science indicators data. Scientometrics, 73(1), 91–117.
D’Angelo, C. A., Giuffrida, C., & Abramo, G. (2011). A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. Journal of the American Society for Information Science and Technology, 62(2), 257–269.
DeBruin, R. E., & Moed, H. F. (1990). The unification of addresses in scientific publications. In L. Egghe & R. Rousseau (Eds.), Informetrics 89/90. Selection of papers submitted for the 2nd International Conference on Bibliometrics, Scientometrics and Informetrics (pp. 65–78). Amsterdam: Elsevier.
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics. Quantitative methods in library, documentation and information science. Amsterdam: Elsevier.
French, J. C., Powell, A. L., & Schulman, E. (2000). Using clustering strategies for creating authority files. Journal of the American Society for Information Science and Technology, 51(8), 774–786.
Galvez, C., & Moya-Anegón, F. (2006). The unification of institutional addresses applying parametrized finite-state graphs. Scientometrics, 69(2), 323–345.
Galvez, C., & Moya-Anegón, F. (2007). Standardizing formats of corporate source data. Scientometrics, 70(1), 3–26.
Jaccard, P. (1901). Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles, 37, 241–272.
Jiang, Y., Zheng, H. T., Wang, X., Lu, B., & Wu, K. (2011). Affiliation disambiguation for constructing semantic digital libraries. Journal of the American Society for Information Science and Technology, 62(6), 1029–1041.
Kim, S. W., & Cho, S. Y. (2013). Characteristics of Korean personal names. Journal of the American Society for Information Science and Technology, 64(1), 86–95.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10, 707–710.
Levin, M., Krawczyk, S., Bethard, S., & Jurafsky, D. (2012). Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology, 63(5), 1030–1047.
Morillo, F., Aparicio, J., González-Albo, B., & Moreno, L. (2013). Towards the automation of address identification. Scientometrics, 94(1), 207–224.
Narin, F., Stevens, K., Anderson, J., Collins, P., Irvine, J., Isard, P., et al. (1988). On-line approaches to measuring national scientific output: a cautionary tale. Science and Public Policy, 15(3), 153–163.
Onodera, N., Iwasawa, M., Midorikawa, N., Yoshikane, F., Amano, K., Ootani, Y., et al. (2011). A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. Journal of the American Society for Information Science and Technology, 62(4), 677–690.
Pereira, D. A., Ribeiro-Neto, B., Ziviani, N., Laender, A. H. F., & Gonçalves, M. A. (2011). A generic web-based entity resolution framework. Journal of the American Society for Information Science and Technology, 62(5), 919–932.
Praal, F., Kosten, J., Calero-Medina, C., & Visser, M. S. (2013). Ranking universities: The challenge of affiliated institutes. Proceedings of the 18 th International Conference on Science and Technology Indicators. Sept. 4–6, 2013, Berlin, 284–289.
Richardson, G. (2010). Automated country name disambiguation for code set alignment. Proceedings of the 14 th European Conference on Research and advanced technology for digital libraries. Springer-Verlag Berlin, Heidelberg, 498–501.
Smalheiser, N. R., & Torvik, V. I. (2009). Author name disambiguation. Annual Review of Information Science and Technology, 43(1), 1–43.
Strotmann, A., & Zhao, D. (2012). Author name disambiguation: What difference does it make in author-based citation analysis. Journal of the American Society for Information Science and Technology, 63(9), 1820–1833.
Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.
Taşkın, Z., & Al, U. (2013). Institutional name confusion on citation indexes: The example of the names of Turkish Hospitals. Procedia—Social and Behavioral Sciences, 73, 544–550.
Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2005). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158.
Van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.
Yang, K. H, Peng, H. T., & Jiang, J. Y. (2008). Author name disambiguation for citation using topic and web correlation. Proceedings of the 12 th Conference in the series of European Digital Library conferences (ECDL2008). Sept.19, 2008, Aarhus, 185–196.
Acknowledgments
We would like to thank Qiuru Peng, Hui Lin, Xueqin Jiang, and Zengli She from the college of information science and technology for their work on data verification. The authors are supported by Grant No. 13CTQ031 of the National Social Science Fund of China.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, S., Yang, B., Yan, S. et al. Institution name disambiguation for research assessment. Scientometrics 99, 823–838 (2014). https://doi.org/10.1007/s11192-013-1214-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-013-1214-2