ABSTRACT
In this paper, we focus on the problem of named entity disambiguation. We disambiguate named entities on a very detailed level. To each entity is assigned a concrete identifier of a corresponding Wikipedia article describing the entity. For such a fine grained disambiguation a correct representation of a context is crucial. We compare various context representations: bag of words representation, linguistic representation and structured co-occurrence representation of the context. Models for each representation are described and evaluated.
- M. Asahara and Y. Matsumoto. Japanese Named Entity extraction with redundant morphological analysis. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL '03, pages 8--15, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics Google ScholarDigital Library
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In K. Aberer, K.-S. Choi, N. Noy, D. Allemang, K.-I. Lee, L. Nixon, J. Golbeck, P. Mika, D. Maynard, R. Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, editors, The Semantic Web, volume 4825 of Lecture Notes in Computer Science, pages 722--735. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-76298-0_52. Google ScholarDigital Library
- D. Awang Iskandar, J. Pehcevski, J. Thom, and S. Tahaghoghi. Social media retrieval using image features and structured text. In N. Fuhr, M. Lalmas, and A. Trotman, editors, Comparative Evaluation of XML Information Retrieval Systems, volume 4518 of Lecture Notes in Computer Science, pages 358--372. Springer Berlin / Heidelberg, 2007.Google Scholar
- T. Berners-Lee, J. Hendler, O. Lassila, and O. Lassila. The semantic web. 2002.Google Scholar
- D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. In Proceedings of the fifth conference on Applied natural language processing, ANLC '97, pages 194--201, Stroudsburg, PA, USA, 1997. Association for Computational Linguistics. Google ScholarDigital Library
- C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. 2009.Google Scholar
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3): 154--165, 2009. Google ScholarDigital Library
- A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. NYU: Description of the MENE Named Entity System as Used in MUC-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7), 1998.Google Scholar
- A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In In AAAI, 2010.Google ScholarDigital Library
- J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL '05, pages 363--370, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Google ScholarDigital Library
- D. Gruhl, M. Nagarajan, J. Pieper, C. Robson, and A. Sheth. Context and domain knowledge enhanced entity spotting in informal text. In A. Bernstein, D. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta, and K. Thirunarayan, editors, The Semantic Web - ISWC 2009, volume 5823 of Lecture Notes in Computer Science, pages 260--276. Springer Berlin / Heidelberg, 2009. 10.1007/978-3-642-04930-9_17. Google ScholarDigital Library
- J. Hassell, B. Aleman-Meza, and I. Arpinar. Ontology-driven automatic entity disambiguation in unstructured text. In I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo, editors, The Semantic Web - ISWC 2006, volume 4273 of Lecture Notes in Computer Science, pages 44--57. Springer Berlin / Heidelberg, 2006. 10.1007/11926078_4. Google ScholarDigital Library
- K. S. Jones, S. Walker, and S. Robertson. A probabilistic model of information retrieval: development and comparative experiments: Part 1. Information Processing & Management, 36(6): 779--808, 2000. Google ScholarDigital Library
- J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- M. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation, SIGDOC '86, pages 24--26, New York, NY, USA, 1986. ACM. Google ScholarDigital Library
- A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, CONLL '03, pages 188--191, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics. Google ScholarDigital Library
- O. Medelyan, I. H. Witten, and D. Milne. Topic indexing with wikipedia, 2008.Google Scholar
- P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, I-Semantics '11, pages 1--8, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, pages 233--242, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management, CIKM '08, pages 509--518, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- H. T. Ng and H. B. Lee. Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, ACL '96, pages 40--47, Stroudsburg, PA, USA, 1996. Association for Computational Linguistics. Google ScholarDigital Library
- M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1400--1405, 2006. Google ScholarDigital Library
- G. Rizzo and R. Troncy. Nerd: A framework for evaluating named entity recognition tools in the web of data. In 10th International Semantic Web Conference (ISWC'11), Demo Session, 2011.Google Scholar
- S. E. Robertson and K. Sparck Jones. Document retrieval systems. chapter Relevance weighting of search terms, pages 143--160. Taylor Graham Publishing, London, UK, UK, 1988. Google ScholarDigital Library
- M. Rowe. Applying semantic social graphs to disambiguate identity references. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. HyvÃűnen, R. Mizoguchi, E. Oren, M. Sabou, and E. Simperl, editors, The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in Computer Science, pages 461--475. Springer Berlin / Heidelberg, 2009. 10.1007/978-3-642-02121-3_35. Google ScholarDigital Library
- D. Rusu, L. Dali, B. Fortuna, M. Grobelnik, and D. Mladenic. Triplet extraction from sentence. In In Proceedings of the 10th International Multiconference "Information Society - IS 2007, pages 218--222, 2007.Google Scholar
- S. Sekine. NYU: Description of the Japanese NE system used for MET-2. In Proceedings of Message Understanding Conference, 1998.Google Scholar
- H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4):265--269, 1973.Google ScholarCross Ref
- R. Volz, J. Kleb, W. Mueller, and W. Mueller. Towards ontology-based disambiguation of geographical identifiers. In I3, 2007.Google Scholar
Index Terms
- Various approaches to text representation for named entity disambiguation
Recommendations
Context Aware Named Entity Disambiguation
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Recently, named entity recognition tools tend to disambiguate recognized named entities on a very detailed level. Instead of elementary types (e.g. Person or Location), they assign concrete identifiers, trying to distinguish even different entities ...
Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information ManagementNamed Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Exploring entity relations for named entity disambiguation
HLT-SS '11: Proceedings of the ACL 2011 Student SessionNamed entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named ...
Comments