skip to main content
10.1145/2428736.2428776acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Various approaches to text representation for named entity disambiguation

Authors Info & Claims
Published:03 December 2012Publication History

ABSTRACT

In this paper, we focus on the problem of named entity disambiguation. We disambiguate named entities on a very detailed level. To each entity is assigned a concrete identifier of a corresponding Wikipedia article describing the entity. For such a fine grained disambiguation a correct representation of a context is crucial. We compare various context representations: bag of words representation, linguistic representation and structured co-occurrence representation of the context. Models for each representation are described and evaluated.

References

  1. M. Asahara and Y. Matsumoto. Japanese Named Entity extraction with redundant morphological analysis. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL '03, pages 8--15, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In K. Aberer, K.-S. Choi, N. Noy, D. Allemang, K.-I. Lee, L. Nixon, J. Golbeck, P. Mika, D. Maynard, R. Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, editors, The Semantic Web, volume 4825 of Lecture Notes in Computer Science, pages 722--735. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-76298-0_52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Awang Iskandar, J. Pehcevski, J. Thom, and S. Tahaghoghi. Social media retrieval using image features and structured text. In N. Fuhr, M. Lalmas, and A. Trotman, editors, Comparative Evaluation of XML Information Retrieval Systems, volume 4518 of Lecture Notes in Computer Science, pages 358--372. Springer Berlin / Heidelberg, 2007.Google ScholarGoogle Scholar
  4. T. Berners-Lee, J. Hendler, O. Lassila, and O. Lassila. The semantic web. 2002.Google ScholarGoogle Scholar
  5. D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. In Proceedings of the fifth conference on Applied natural language processing, ANLC '97, pages 194--201, Stroudsburg, PA, USA, 1997. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. 2009.Google ScholarGoogle Scholar
  7. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3): 154--165, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. NYU: Description of the MENE Named Entity System as Used in MUC-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7), 1998.Google ScholarGoogle Scholar
  9. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In In AAAI, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL '05, pages 363--370, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gruhl, M. Nagarajan, J. Pieper, C. Robson, and A. Sheth. Context and domain knowledge enhanced entity spotting in informal text. In A. Bernstein, D. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta, and K. Thirunarayan, editors, The Semantic Web - ISWC 2009, volume 5823 of Lecture Notes in Computer Science, pages 260--276. Springer Berlin / Heidelberg, 2009. 10.1007/978-3-642-04930-9_17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Hassell, B. Aleman-Meza, and I. Arpinar. Ontology-driven automatic entity disambiguation in unstructured text. In I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo, editors, The Semantic Web - ISWC 2006, volume 4273 of Lecture Notes in Computer Science, pages 44--57. Springer Berlin / Heidelberg, 2006. 10.1007/11926078_4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. S. Jones, S. Walker, and S. Robertson. A probabilistic model of information retrieval: development and comparative experiments: Part 1. Information Processing & Management, 36(6): 779--808, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation, SIGDOC '86, pages 24--26, New York, NY, USA, 1986. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, CONLL '03, pages 188--191, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Medelyan, I. H. Witten, and D. Milne. Topic indexing with wikipedia, 2008.Google ScholarGoogle Scholar
  18. P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, I-Semantics '11, pages 1--8, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, pages 233--242, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management, CIKM '08, pages 509--518, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. T. Ng and H. B. Lee. Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, ACL '96, pages 40--47, Stroudsburg, PA, USA, 1996. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1400--1405, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Rizzo and R. Troncy. Nerd: A framework for evaluating named entity recognition tools in the web of data. In 10th International Semantic Web Conference (ISWC'11), Demo Session, 2011.Google ScholarGoogle Scholar
  24. S. E. Robertson and K. Sparck Jones. Document retrieval systems. chapter Relevance weighting of search terms, pages 143--160. Taylor Graham Publishing, London, UK, UK, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Rowe. Applying semantic social graphs to disambiguate identity references. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. HyvÃűnen, R. Mizoguchi, E. Oren, M. Sabou, and E. Simperl, editors, The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in Computer Science, pages 461--475. Springer Berlin / Heidelberg, 2009. 10.1007/978-3-642-02121-3_35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Rusu, L. Dali, B. Fortuna, M. Grobelnik, and D. Mladenic. Triplet extraction from sentence. In In Proceedings of the 10th International Multiconference "Information Society - IS 2007, pages 218--222, 2007.Google ScholarGoogle Scholar
  27. S. Sekine. NYU: Description of the Japanese NE system used for MET-2. In Proceedings of Message Understanding Conference, 1998.Google ScholarGoogle Scholar
  28. H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4):265--269, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  29. R. Volz, J. Kleb, W. Mueller, and W. Mueller. Towards ontology-based disambiguation of geographical identifiers. In I3, 2007.Google ScholarGoogle Scholar

Index Terms

  1. Various approaches to text representation for named entity disambiguation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
        December 2012
        432 pages
        ISBN:9781450313063
        DOI:10.1145/2428736

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 December 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader