ABSTRACT
Recognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of mentions in text, and an Entity Linking (EL) system to connect the mentions to entries in structured or semi-structured repositories like Wikipedia. However, the two tasks are tightly coupled, and each type of system can benefit significantly from the kind of information provided by the other. In this proposal, we present a joint model for NER and EL, called NEREL, that takes a large set of candidate mentions from typical NER systems and a large set of candidate entity links from EL systems, and ranks the candidate mention-entity pairs together to make joint predictions. In our initial NER and EL experiments across three datasets, NEREL significantly outperforms or comes close to the performance of two state-of-the-art NER systems, and it outperforms 6 competing EL systems. On the benchmark MSNBC dataset, NEREL provides a 60% reduction in error over the next-best NER system and a 68% reduction in error over the next-best EL system.
- R. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL, 2006.Google Scholar
- Y. Chen and J. Martin. Towards Robust Unsupervised Personal Name Disambiguation. In EMNLP, pages 190--198, 2007.Google Scholar
- R. Cilibrasi and P. Vitanyi. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383, 2007. Google ScholarDigital Library
- S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, pages 708--716, 2007.Google Scholar
- A. Davis, A. Veloso, A. S. da Silva, W. Meira Jr, and A. H. Laender. Named entity disambiguation in streaming data. In ACL, 2012. Google ScholarDigital Library
- P. Ferragina and U. Scaiella. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In CIKM, 2010. Google ScholarDigital Library
- J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005. Google ScholarDigital Library
- S. Guo, M.-W. Chang, and E. Kıcıman. To link or not to link? a study on end-to-end tweet entity linking. In NAACL, 2013.Google Scholar
- X. Han, L. Sun, and J. Zhao. Collective entity linking in web text: a graph-based method. In SIGIR, 2011. Google ScholarDigital Library
- X. Han and J. Zhao. Named entity disambiguation by leveraging Wikipedia semantic knowledge. In CIKM, pages 215--224, 2009. Google ScholarDigital Library
- J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum1. Robust Disambiguation of Named Entities in Text. In EMNLP, pages 782--792, 2011. Google ScholarDigital Library
- S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In KDD, pages 457--466, 2009. Google ScholarDigital Library
- T. Kwiatkowski, L. Zettlemoyer, S. Goldwater, and M. Steedman. Lexical Generalization in CCG Grammar Induction for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011. Google ScholarDigital Library
- T. Lin, Mausam, and O. Etzioni. Entity Linking at Web Scale. In AKBC-WEKEX, 2012. Google ScholarDigital Library
- T. Lin, Mausam, and O. Etzioni. No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities. In EMNLP, 2012. Google ScholarDigital Library
- G. Mann and D. Yarowsky. Unsupervised personal name disambiguation. In CoNLL, 2003. Google ScholarDigital Library
- E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM, 2012. Google ScholarDigital Library
- P. N. Mendes, M. Jakob, and C. Bizer. Evaluating DBpedia Spotlight for the TAC-KBP Entity Linking Task. In TAC, 2011.Google Scholar
- P. N. Mendes, M. Jakob, and C. Bizer. DBpedia for NLP: A Multilingual Cross-domain Knowledge Base. In LREC, 2012.Google Scholar
- R. Mihalcea and A. Csomai. Wikify!: Linking documents to encyclopedic knowledge. In CIKM, pages 233--242, 2007. Google ScholarDigital Library
- D. Milne and I. H. Witten. Learning to link with wikipedia. In CIKM, 2008. Google ScholarDigital Library
- V. Punyakanok and D. Roth. The use of classifiers in sequential inference. 2001.Google Scholar
- L. Ratinov and D. Roth. Design challenges and misconceptions in named entity recognition. In CoNLL, 2009. Google ScholarDigital Library
- L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, 2011. Google ScholarDigital Library
- A. Sil, E. Cronin, P. Nie, Y. Yang, A.-M. Popescu, and A. Yates. Linking Named Entities to Any Database. In EMNLP-CoNLL, 2012. Google ScholarDigital Library
- B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. NIPS, 2003.Google ScholarDigital Library
- E. F. Tjong Kim Sang and F. De Meulder. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In Seventh Conference on Natural language learning at HLT-NAACL 2003-Volume 4, 2003. Google ScholarDigital Library
- I. Tsochantaridis, T. Joachims, T. Hofmann, Y. Altun, and Y. Singer. Large margin methods for structured and interdependent output variables. JMLR, 2006. Google ScholarDigital Library
- P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Procs. of ACL, pages 417--424, 2002. Google ScholarDigital Library
- Y. Zhou, L. Nie, O. Rouhani-Kalleh, F. Vasile, and S. Gaffney. Resolving surface forms to wikipedia topics. In Coling, pages 1335--1343, 2010. Google ScholarDigital Library
Index Terms
- Exploring re-ranking approaches for joint named-entityrecognition and linking
Recommendations
Re-ranking for joint named-entity recognition and linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementRecognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of ...
DAWT: Densely Annotated Wikipedia Texts Across Multiple Languages
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionIn this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The ...
Collective entity linking in web text: a graph-based method
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalEntity Linking (EL) is the task of linking name mentions in Web text with their referent entities in a knowledge base. Traditional EL methods usually link name mentions in a document by assuming them to be independent. However, there is often additional ...
Comments