Abstract
Institution information specification and correlation is a necessity for research evaluation and resource sharing, current attempts are mainly focused on institution name disambiguation (IND) based on institution name, address, author, et al., and lack of a unified and universal indicator. To enhance the correlation of institution information, institutional persistent identifier (PID) is introduced in this study, together with a redesigned tool based on existing techniques of IND. And an institution metadata specification model is built for data preprocess by inheriting some authoritative metadata standards. Further, a visual platform is implemented to demonstrate the correlated institution information and supports institution query. The performance of the proposed approach is evaluated on large datasets of three countries, and the test results demonstrate that the precision and recall are high.
Similar content being viewed by others
References
Bourke, P., & Butler, L. (1996). Standards issues in a national bibliometric database: The Australian case. Scientometrics,35(2), 199–207.
Cuxac, P., Lamirel, C. J., & Bonvallot, V. (2013). Efficient supervised and semi-supervised approaches for affiliations disambiguation. Scientometrics,97(1), 47–58.
De Bruin, R. E. (1990). The unification of addresses in scientific publications. Informetrics 1989/90, 6578. Amsterdam: Elsevier.
French, C. J., Powell, I. A., & Schulman, E. (2000). Using clustering strategies for creating authority files. Journal of the American Society for Information Science and Technology,51(8), 774–786.
Galvez, C., & Moya-Anegon, F. (2006). The unification of institutional addresses applying parametrized. Scientometrics,69(2), 323–345.
Huang, S. L., Deng, H. Z., Tang, W. S., Wang, Q. W., & Chen, L. (2012). A Chinese organization’s full name and matching abbreviation algorithm based on edit-distance. Journal of Shandong University,47(5), 43–48.
Huang, J., Ertekin, S., & Giles, L. C. (2006). Efficient name disambiguation for large-scale databases. In European conference on principle & practice of knowledge discovery in databases (Vol. 4213, pp. 536–544). Springer-Verlag.
Huang, Q. S., Yang, B., Yan, L. S., & Rousseau, R. (2014). Institution name disambiguation for research assessment. Scientometrics,99(3), 823–838.
Jiang, Y., Zheng, T. H., Wang, X., Lu, B., & Wu, K. (2011). Affiliation disambiguation for constructing semantic digital libraries. Journal of the American Society for Information Science and Technology,62(6), 1029–1041.
Juha, H. (2010). Persistent identifiers - an overview. Technology Watch Report (TWR): Standards in metadata and Interoperability.
Levenshtein, I. V. (1996). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady,10, 707–710.
Onodera, N., Iwasawa, M., Midorikawa, N., Yoshikane, F., Amano, K., et al. (2011). A method for eliminating articles by homobynous authors form the large number if articles retrieved by author search. Journal of the American Society for Information Science and Technology,62(4), 667–690.
Shen, Q. Z., Zhang, Y. J., & et al. (2015, December 31). Metadata standard of NSTL unified literature 3.0 (official version). Retrieved October 8, 2018 from http://spec.nstl.gov.cn/embed/metastandard.html?parentPageId=1551943054980&metastandardid=357&base=base.
Sun, X. H., Wang, L., Wu, J. Y., Hua, N. W., & Li, L. J. (2018). Matching strategies for institution names in literature database. Data Analysis and Knowledge Discovery,2(8), 92–101.
Wan, H. Y., Liu, J. L., & Huang, S. Q. (2017). Name recognition of chinese medical institutions based on cascading conditional random fields. Journal of University of Jinan (Science and Technology),31(1), 61–66.
Xian, J. G., Li, J., Kou, T. Y., Luo, T. T., & Huang, W. Y. (2018). Construction and application of upper country ontology based on OWL and SKOS. In conference: The 2nd international conference (pp. 1–6).
Xiang, M. X. (2016). Research and application of the Chinese organization names recognition and disambiguation. East China normal university, MA dissertation, China.
Yang, H. K., Peng, H. T., & Jiang, Y. J. (2008). Author name disambiguation for citation using topic and web correlation. In Proceedings of the 12th Conference in the series of European digital library conferences (ECDL2008) (pp. 185–196). Aarhus.
Yang, B., Yang, W. J., & Yan, L. S. (2015). Research on rule-based normalization of institution name. New Technology of Library and Information Service,6, 57–63.
Yerva, R. S., & Miklós, Z. (2010). It Was Easy, when Apples and Blackberries were only fruits. In Proceedings of the third web people search evaluation workshop. Padua.
Yoshida, M., Matsushima, S., Ono, S., & et al. (2010). Tweet categorization by query categorization for on-line reputation management. In Proceedings of the third web people search evaluation workshop. Padua.
Zhang, H. X., & Wang, L. L. (1997). Identification and analysis of Chinese organization. Journal of Chinese Information Processing,11(4), 22–33.
Zhang, S., Wu, J., Zheng, D., Meng, Y., & Yu, H. (2012). An adaptive method for organization name disambiguation with feature reinforcing. In Proceedings of the 26th Pacific Asia conference on language, information and computation (pp. 237–245).
Zhao, S. (2017, October 20). What is ETL? (Extract, transform, load) | Experian. Retrieved October 18, 2018, from experian data quality https://www.webopedia.com/TERM/E/ETL.html.
Zhao, J., & Liu, F. (2008). Product named entity recognition in Chinese text. Language Resources & Evaluation,42(2), 197–217.
Zhu, H. D., Yang, L., & Wang, B. D. (2016). Recognizing Chinese organization names based on deep learning. New Technology of Library and Information Service,12, 40–47.
Acknowledgements
This work was partially supported by “Design and Research on A Next Generation of Open Knowledge Services System and Key Technologies” project (No.: 2019XM55) and “Basic Research Business Fee Project of Chinese Academy of Agricultural Science” project (No.: Y2019PT15).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, Y., Li, J., Sun, T. et al. Institution information specification and correlation based on institutional PIDs and IND tool. Scientometrics 122, 381–396 (2020). https://doi.org/10.1007/s11192-019-03268-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-019-03268-9