Skip to main content

Name Disambiguation Using Semi-supervised Topic Model

  • Conference paper
  • First Online:
  • 3023 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9227))

Abstract

Name ambiguity is increasingly attracting more attention. With the development of information available on the Web, name disambiguation is becoming one of the most challenging tasks. For example, some persons may share the same personal name. In order to address this problem, topic coherence principle is used to eliminate ambiguity of the name entity. A semi-supervised topic model (STM) is proposed. When we search online, many irrelevant documents always return to users. Wikipedia hierarchical structure information enrich the semantics of the name entity. Information extracted from Wikipedia is sorted out and put in the knowledge base. It is used to match the query entity. By utilizing the context of the given query entity, we attempt to disambiguate various meanings with the proposed model. Experiments on two real-life datasets, show that STM is more superior than baselines (ETM and WPAM) with accuracy 84.75 %. The result shows that our method is promising in name disambiguation as well. Our work can provide invaluable insights into entity disambiguation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://download.csdn.net/detail/bzbcxwp/311639/#comment.

References

  1. Li, Y., Wang, C., Han, F., Han, J., Roth D., Yan, X.: Mining evidences for named entity disambiguation. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1070–1078. ACM (2013)

    Google Scholar 

  2. Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with wikipedia. Artif. Intell. 194, 130–150 (2013)

    Article  MathSciNet  Google Scholar 

  3. Wang, F., Tang, J., Li, J., Wang, K.: A constraint-based topic modeling approach for name disambiguation. Front. Comput. Sci. China 4(1), 100–111 (2010)

    Article  MathSciNet  Google Scholar 

  4. Peng, H.T., Lu, C.Y., Hsu, W., Ho, J.M.: Disambiguating authors in citations on the web and authorship correlations. Expert Syst. Appl. 39(12), 10521–10532 (2012)

    Article  Google Scholar 

  5. Jun, S., Park, S.S., Jang, D.S.: Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Syst. Appl. 41(7), 3204–3212 (2014)

    Article  Google Scholar 

  6. Kang, I.S., Na, S.H., Lee, S., Jung, H., Kim, P., Sung, W.K., Lee, J.H.: On co-authorship for author disambiguation. Inf. Process. Manag. 45(1), 84–97 (2009)

    Article  Google Scholar 

  7. Hoffart, J., Yosef, M.A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 782–792. Association for Computational Linguistics (2011)

    Google Scholar 

  8. Niu, L., Wu, J., Shi, Y.: Entity disambiguation with textual and connection information. Procedia Comput. Sci. 9, 1249–1255 (2012)

    Article  Google Scholar 

  9. Han, X., Sun, L.: An entity-topic model for entity linking. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 105–115. Association for Computational Lingustics (2012)

    Google Scholar 

  10. Sen, P.: Collective context-aware topic models for entity disambiguation. In: Proceedings of the 21st International Conference on World Wide Web, pp. 729–738. ACM (2012)

    Google Scholar 

  11. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval, vol. 463. ACM Press, New York (1999)

    Google Scholar 

  12. Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1. pp. 79–85. Association for Computational Linguistics (1998)

    Google Scholar 

  13. Pedersen, T., Purandare, A., Kulkarni, A.: Name discrimination by clustering similar contexts. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 226–237. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Schutze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)

    MathSciNet  Google Scholar 

  15. Fernandez-Amoros, D., Heradio, R.: Understanding the role of conceptual relations in word sense disambiguation. Expert Syst. Appl. 38(8), 9506–9516 (2011)

    Article  Google Scholar 

  16. Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. EACL 6, 9–16 (2006)

    Google Scholar 

  17. Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. EMNLP-CoNLL 7, 708–716 (2007)

    MATH  Google Scholar 

  18. Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 277–285. Association for Computational Linguistics (2010)

    Google Scholar 

  19. Chen, Y., Martin, J.: Towards robust unsupervised personal name disambiguation. In: EMNLP-CoNLL, pp. 190–198. Citeseer (2007)

    Google Scholar 

  20. Nguyen, H.T., Cao, T.H.: A knowledge-based approach to named entity disambiguation in news articles. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 619–624. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Bhattacharya, I., Getoor, L.: A latent dirichlet model for unsupervised entity resolution. In: SDM, vol. 5, p. 59. SIAM (2006)

    Google Scholar 

  22. Lu, Y., Mei, Q., Zhai, C.: Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inf. Retrieval 14(2), 178–203 (2011)

    Article  Google Scholar 

  23. Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)

    Google Scholar 

  24. Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1037–1045. ACM (2011)

    Google Scholar 

  25. Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from wikipedia. Artif. Intell. 194, 151–175 (2013)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by NSFC (No.61170192) and National College Students’ Innovative and Entrepreneurial Training Program (No.201410635029). L. Li is the corresponding author for the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Fu, J., Qiu, J., Wang, J., Li, L. (2015). Name Disambiguation Using Semi-supervised Topic Model. In: Huang, DS., Han, K. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2015. Lecture Notes in Computer Science(), vol 9227. Springer, Cham. https://doi.org/10.1007/978-3-319-22053-6_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22053-6_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22052-9

  • Online ISBN: 978-3-319-22053-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics