research-article

Various approaches to text representation for named entity disambiguation

Authors:
Ivo Lašek

Czech Technical University in Prague and Charles University in Prague, Prague, Czech Republic

Czech Technical University in Prague and Charles University in Prague, Prague, Czech Republic
View Profile

,
Peter Vojtáš

Charles University in Prague, Prague, Czech Republic

Charles University in Prague, Prague, Czech Republic
View Profile

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & ServicesDecember 2012Pages 256–262https://doi.org/10.1145/2428736.2428776

Published:03 December 2012Publication History

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

Pages 256–262

ABSTRACT

In this paper, we focus on the problem of named entity disambiguation. We disambiguate named entities on a very detailed level. To each entity is assigned a concrete identifier of a corresponding Wikipedia article describing the entity. For such a fine grained disambiguation a correct representation of a context is crucial. We compare various context representations: bag of words representation, linguistic representation and structured co-occurrence representation of the context. Models for each representation are described and evaluated.

References

M. Asahara and Y. Matsumoto. Japanese Named Entity extraction with redundant morphological analysis. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL '03, pages 8--15, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics Google ScholarDigital Library
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. In K. Aberer, K.-S. Choi, N. Noy, D. Allemang, K.-I. Lee, L. Nixon, J. Golbeck, P. Mika, D. Maynard, R. Mizoguchi, G. Schreiber, and P. Cudré-Mauroux, editors, The Semantic Web, volume 4825 of Lecture Notes in Computer Science, pages 722--735. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-76298-0_52. Google ScholarDigital Library
D. Awang Iskandar, J. Pehcevski, J. Thom, and S. Tahaghoghi. Social media retrieval using image features and structured text. In N. Fuhr, M. Lalmas, and A. Trotman, editors, Comparative Evaluation of XML Information Retrieval Systems, volume 4518 of Lecture Notes in Computer Science, pages 358--372. Springer Berlin / Heidelberg, 2007.Google Scholar
T. Berners-Lee, J. Hendler, O. Lassila, and O. Lassila. The semantic web. 2002.Google Scholar
D. M. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a high-performance learning name-finder. In Proceedings of the fifth conference on Applied natural language processing, ANLC '97, pages 194--201, Stroudsburg, PA, USA, 1997. Association for Computational Linguistics. Google ScholarDigital Library
C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. 2009.Google Scholar
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3): 154--165, 2009. Google ScholarDigital Library
A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. NYU: Description of the MENE Named Entity System as Used in MUC-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7), 1998.Google Scholar
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In In AAAI, 2010.Google ScholarDigital Library
J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL '05, pages 363--370, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Google ScholarDigital Library
D. Gruhl, M. Nagarajan, J. Pieper, C. Robson, and A. Sheth. Context and domain knowledge enhanced entity spotting in informal text. In A. Bernstein, D. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta, and K. Thirunarayan, editors, The Semantic Web - ISWC 2009, volume 5823 of Lecture Notes in Computer Science, pages 260--276. Springer Berlin / Heidelberg, 2009. 10.1007/978-3-642-04930-9_17. Google ScholarDigital Library
J. Hassell, B. Aleman-Meza, and I. Arpinar. Ontology-driven automatic entity disambiguation in unstructured text. In I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. Aroyo, editors, The Semantic Web - ISWC 2006, volume 4273 of Lecture Notes in Computer Science, pages 44--57. Springer Berlin / Heidelberg, 2006. 10.1007/11926078_4. Google ScholarDigital Library
K. S. Jones, S. Walker, and S. Robertson. A probabilistic model of information retrieval: development and comparative experiments: Part 1. Information Processing & Management, 36(6): 779--808, 2000. Google ScholarDigital Library
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML '01, pages 282--289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
M. Lesk. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international conference on Systems documentation, SIGDOC '86, pages 24--26, New York, NY, USA, 1986. ACM. Google ScholarDigital Library
A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4, CONLL '03, pages 188--191, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics. Google ScholarDigital Library
O. Medelyan, I. H. Witten, and D. Milne. Topic indexing with wikipedia, 2008.Google Scholar
P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, I-Semantics '11, pages 1--8, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, pages 233--242, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management, CIKM '08, pages 509--518, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
H. T. Ng and H. B. Lee. Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, ACL '96, pages 40--47, Stroudsburg, PA, USA, 1996. Association for Computational Linguistics. Google ScholarDigital Library
M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain. Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06), pages 1400--1405, 2006. Google ScholarDigital Library
G. Rizzo and R. Troncy. Nerd: A framework for evaluating named entity recognition tools in the web of data. In 10th International Semantic Web Conference (ISWC'11), Demo Session, 2011.Google Scholar
S. E. Robertson and K. Sparck Jones. Document retrieval systems. chapter Relevance weighting of search terms, pages 143--160. Taylor Graham Publishing, London, UK, UK, 1988. Google ScholarDigital Library
M. Rowe. Applying semantic social graphs to disambiguate identity references. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. HyvÃűnen, R. Mizoguchi, E. Oren, M. Sabou, and E. Simperl, editors, The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in Computer Science, pages 461--475. Springer Berlin / Heidelberg, 2009. 10.1007/978-3-642-02121-3_35. Google ScholarDigital Library
D. Rusu, L. Dali, B. Fortuna, M. Grobelnik, and D. Mladenic. Triplet extraction from sentence. In In Proceedings of the 10th International Multiconference "Information Society - IS 2007, pages 218--222, 2007.Google Scholar
S. Sekine. NYU: Description of the Japanese NE system used for MET-2. In Proceedings of Message Understanding Conference, 1998.Google Scholar
H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4):265--269, 1973.Google ScholarCross Ref
R. Volz, J. Kleb, W. Mueller, and W. Mueller. Towards ontology-based disambiguation of geographical identifiers. In I3, 2007.Google Scholar

Index Terms

Various approaches to text representation for named entity disambiguation
1. Applied computing
  1. Document management and text processing
    1. Document management
      1. Text editing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Context Aware Named Entity Disambiguation
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Recently, named entity recognition tools tend to disambiguate recognized named entities on a very detailed level. Instead of elementary types (e.g. Person or Location), they assign concrete identifiers, trying to distinguish even different entities ...
Read More
Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information Management

Named Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Read More
Exploring entity relations for named entity disambiguation
HLT-SS '11: Proceedings of the ACL 2011 Student Session

Named entity disambiguation is the task of linking an entity mention in a text to the correct real-world referent predefined in a knowledge base, and is a crucial subtask in many areas like information retrieval or topic detection and tracking. Named ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
December 2012
432 pages
ISBN:9781450313063
DOI:10.1145/2428736
General Chair:
Eric Pardede
La Trobe University, Australia
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 December 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
linked data
models
named entity recognition and disambiguation
text annotation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 109
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Various approaches to text representation for named entity disambiguation

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

Context Aware Named Entity Disambiguation

Named entity recognition and disambiguation using linked data and graph-based centrality scoring

Exploring entity relations for named entity disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Various approaches to text representation for named entity disambiguation

IIWAS '12: Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services

ABSTRACT

References

Cited By

Index Terms

Recommendations

Context Aware Named Entity Disambiguation

Named entity recognition and disambiguation using linked data and graph-based centrality scoring

Exploring entity relations for named entity disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media