Abstract
Name ambiguity is increasingly attracting more attention. With the development of information available on the Web, name disambiguation is becoming one of the most challenging tasks. For example, some persons may share the same personal name. In order to address this problem, topic coherence principle is used to eliminate ambiguity of the name entity. A semi-supervised topic model (STM) is proposed. When we search online, many irrelevant documents always return to users. Wikipedia hierarchical structure information enrich the semantics of the name entity. Information extracted from Wikipedia is sorted out and put in the knowledge base. It is used to match the query entity. By utilizing the context of the given query entity, we attempt to disambiguate various meanings with the proposed model. Experiments on two real-life datasets, show that STM is more superior than baselines (ETM and WPAM) with accuracy 84.75 %. The result shows that our method is promising in name disambiguation as well. Our work can provide invaluable insights into entity disambiguation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Li, Y., Wang, C., Han, F., Han, J., Roth D., Yan, X.: Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1070–1078. ACM (2013)
Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with wikipedia. Artif. Intell. 194, 130–150 (2013)
Wang, F., Tang, J., Li, J., Wang, K.: A constraint-based topic modeling approach for name disambiguation. Front. Comput. Sci. China 4(1), 100–111 (2010)
Peng, H.T., Lu, C.Y., Hsu, W., Ho, J.M.: Disambiguating authors in citations on the web and authorship correlations. Expert Syst. Appl. 39(12), 10521–10532 (2012)
Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Syst. Appl. 41(7), 3204–3212 (2014)
Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)
Hoffart, J., Yosef, M.A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 782–792. Association for Computational Linguistics (2011)
Niu, L., Wu, J., Shi, Y.: Entity disambiguation with textual and connection information. Procedia Comput. Sci. 9, 1249–1255 (2012)
Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 105–115. Association for Computational Lingustics (2012)
Sen, P.: Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st International Conference on World Wide Web, pp. 729–738. ACM (2012)
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)
Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1. pp. 79–85. Association for Computational Linguistics (1998)
Pedersen, T., Purandare, A., Kulkarni, A.: Name discrimination by clustering similar contexts. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 226–237. Springer, Heidelberg (2005)
Schutze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)
Fernandez-Amoros, D., Heradio, R.: Understanding the role of conceptual relations in word sense disambiguation. Expert Syst. Appl. 38(8), 9506–9516 (2011)
Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. EACL 6, 9–16 (2006)
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. EMNLP-CoNLL 7, 708–716 (2007)
Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 277–285. Association for Computational Linguistics (2010)
Chen, Y., Martin, J.: Towards robust unsupervised personal name disambiguation. In: EMNLP-CoNLL, pp. 190–198. Citeseer (2007)
Nguyen, H.T., Cao, T.H.: A knowledge-based approach to named entity disambiguation in news articles. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 619–624. Springer, Heidelberg (2007)
Bhattacharya, I., Getoor, L.: A latent dirichlet model for unsupervised entity resolution. In: SDM, vol. 5, p. 59. SIAM (2006)
Lu, Y., Mei, Q., Zhai, C.: Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf. Retrieval 14(2), 178–203 (2011)
Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)
Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1037–1045. ACM (2011)
Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from wikipedia. Artif. Intell. 194, 151–175 (2013)
Acknowledgments
This work was supported by NSFC (No.61170192) and National College Students’ Innovative and Entrepreneurial Training Program (No.201410635029). L. Li is the corresponding author for the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Fu, J., Qiu, J., Wang, J., Li, L. (2015). Name Disambiguation Using Semi-supervised Topic Model. In: Huang, DS., Han, K. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2015. Lecture Notes in Computer Science(), vol 9227. Springer, Cham. https://doi.org/10.1007/978-3-319-22053-6_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-22053-6_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22052-9
Online ISBN: 978-3-319-22053-6
eBook Packages: Computer ScienceComputer Science (R0)