Skip to main content

Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach

  • Conference paper
The Semantic Web (ASWC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5367))

Included in the following conference series:

Abstract

The rapidly increasing use of large-scale data on the Web makes named entity disambiguation become one of the main challenges to research in Information Extraction and development of Semantic Web. This paper presents a novel method for detecting proper names in a text and linking them to the right entities in Wikipedia. The method is hybrid, containing two phases of which the first one utilizes some heuristics and patterns to narrow down the candidates, and the second one employs the vector space model to rank the ambiguous cases to choose the right candidate. The novelty is that the disambiguation process is incremental and includes several rounds that filter the candidates, by exploiting previously identified entities and extending the text by those entity attributes every time they are successfully resolved in a round. We test the performance of the proposed method in disambiguation of names of people, locations and organizations in texts of the news domain. The experiment results show that our approach achieves high accuracy and can be used to construct a robust named entity disambiguation system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  2. Bunescu, R., Paşca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proc. of the 11th Conference of EACL, pp. 9–16 (2006)

    Google Scholar 

  3. Bontcheva, K., Dimitrov, M., Maynard, D., Tablan, V., Cunningham, H.: Shallow Methods for Named Entity Coreference Resolution. In: Proc. of TALN 2002 Workshop, Nancy, France (2002)

    Google Scholar 

  4. Cucerzan, S.: Large-Scale Named Entity Disambiguation Based on Wikipedia data. In: Proc. of EMNLP-CoNLL Joint Conference (2007)

    Google Scholar 

  5. Cohen, W., Ravikumar, P., Fienberg, S.: A Comparison of String Metrics for Name-Matching Tasks. In: IJCAI-03 II-Web Workshop (2003)

    Google Scholar 

  6. Cunningham, H., et al.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proc. of the 40th ACL (2002)

    Google Scholar 

  7. Chinchor, N., Robinson, P.: MUC-7 Named Entity Task Definition. In: Proc. of MUC-7 (1998)

    Google Scholar 

  8. Cimiano, P., Völker, J.: Towards large-scale, open-domain and ontology-based named entity classification. In: Proc. of RANLP 2005, pp. 166–172 (2005)

    Google Scholar 

  9. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 Shared Task: Language Independent Named Entity Recognition. In: Proc. of CoNLL 2003, pp. 142–147 (2003)

    Google Scholar 

  10. Fernandez, N., et al.: IdentityRank: Named entity disambiguation in the context of the NEWS project. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519. Springer, Heidelberg (2007)

    Google Scholar 

  11. Fleischman, M., Hovy, E.: Fine grained classification of named entities. In: Proc. of Conference on Computational Linguistics (2002)

    Google Scholar 

  12. Gooi, C.H., Allan, J.: Cross-document coreference on a large-scale corpus. In: Proc. of HLT-NAACL for Computational Linguistics Annual Meeting, Boston, MA (2004)

    Google Scholar 

  13. Hassell, J., Aleman-Meza, B., Arpinar, I.B.: Ontology-driven automatic entity disambiguation in unstructured text. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 44–57. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  14. Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)

    Google Scholar 

  15. Wacholder, N., Ravin, Y., Choi, M.: Disambiguation of proper names in text. In: Proc. of ANLP, pp. 202–208 (1997)

    Google Scholar 

  16. Nguyen, H.T., Cao, T.H.: A knowledge-based approach to named entity disambiguation in news articles. In: Orgun, M.A., Thornton, J. (eds.) AI 2007. LNCS (LNAI), vol. 4830, pp. 619–624. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  17. Peng, Y., He, D., Mao, M.: Geographic Named Entity Disambiguation with Automatic Profile Generation. In: Proc. of WI 2006 (2006)

    Google Scholar 

  18. Raphael, V., Joachim, K., Wolfgang, M.: Towards Ontology-based Disambiguation of Geographical Identifiers. In: Proc. of the 16th WWW Workshop on I3: Identity, Identifiers, Identifications (2007)

    Google Scholar 

  19. Remy, M.: Wikipedia: The free encyclopedia. Information Review 26(6), 434 (2002)

    Google Scholar 

  20. Shadbolt, N., Hall, W., Berners-Lee, T.: The Semantic Web Revisited. IEEE Intelligent Systems 21(3), 96–101 (2006)

    Article  Google Scholar 

  21. Overell, S., Rüger, S.: Geographic Co-occurrence as a Tool for GIR. In: Proc. of CIKM Workshop on Geographic Information Retrieval, Lisbon, Portugal, pp. 71–76 (2007)

    Google Scholar 

  22. Smith, D., Mann, G.: Bootstrapping toponym classifiers. In: HLT-NAACL Workshop on Analysis of Geographic References, pp. 45–49 (2003)

    Google Scholar 

  23. Weaver, G., Strickland, B., Crane, G.: Quantifying the accuracy of relational statements in Wikipedia: a methodology. In: Proc. of JCDL, pp. 358–358 (2006)

    Google Scholar 

  24. Zesch, T., Gurevych, I., Mühlhäuser, M.: Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In: Rehm, G., Witt, A., Lemnitzer, L. (eds.) Data Structures for Linguistic Resources and Applications, pp. 197–205 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, H.T., Cao, T.H. (2008). Named Entity Disambiguation: A Hybrid Statistical and Rule-Based Incremental Approach. In: Domingue, J., Anutariya, C. (eds) The Semantic Web. ASWC 2008. Lecture Notes in Computer Science, vol 5367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89704-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89704-0_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89703-3

  • Online ISBN: 978-3-540-89704-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics