ABSTRACT
Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.
- M. Bilenko, B. Kamath, and R. J. Mooney. Adaptive blocking: Learning to scale up record linkage. In phProceedings of the Sixth International Conference on Data Mining, ICDM '06, pages 87--96, Washington, DC, USA, 2006. IEEE Computer Society. ISBN 0--7695--2701--9. http://dx.doi.org/10.1109/ICDM.2006.13. URL http://dx.doi.org/10.1109/ICDM.2006.13. Google ScholarDigital Library
- C. Bohm, G. de Melo, F. Naumann, and G. Weikum. Linda: Distributed web-of-data-scale entity matching. In phCIKM, 2012. Google ScholarDigital Library
- H. L. Dunn. Record linkage. phAmerican Journal of Public Health, 36 (12): 1412--1416, 1946.Google ScholarCross Ref
- A. K. McCallum, K. Nigam, and L. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In phProceedings of the Sixth International Conference On Knowledge Discovery and Data Mining (KDD-2000), Boston, MA, 2000. Google ScholarDigital Library
- R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In phProceedings of the sixteenth ACM conference on Conference on information and knowledge management, CIKM '07, pages 233--242, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--803--9. Google ScholarDigital Library
- H. B. Newcombe. Record linking: the design of efficient systems for linking records into individual and family histories. phthe American Journal of Human Genetics, 19 (3): 334--359, 1967.Google Scholar
- D. Rao, P. McNamee, and M. Dredze. Streaming cross document entity coreference resolution. In phCOLING (Posters), pages 1050--1058, 2010. Google ScholarDigital Library
- L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In phAnnual Meeting of the Association for Computational Linguistics (ACL), 2011. URL http://cogcomp.cs.illinois.edu/papers/RRDA11.pdf. Google ScholarDigital Library
- S. Singh, A. Subramanya, F. Pereira, and A. McCallum. Large-scale cross-document coreference using distributed inference and hierarchical models. In phAssociation for Computational Linguistics: Human Language Technologies (ACL HLT), 2011. Google ScholarDigital Library
- lum}singh12:wiki-linksS. Singh, A. Subramanya, F. Pereira, and A. McCallum. WikiLinks: Large-scale cross-document coreference corpus labeled via links to wikipedia. Technical Report UM-CS-2012-015, University of Massachusetts, Amherst, 2012.Google Scholar
- M. Wick, S. Singh, and A. McCallum. A discriminative hierarchical model for fast coreference at large scale. In phAssociation for Computational Linguistics (ACL), 2012. Google ScholarDigital Library
Index Terms
- A joint model for discovering and linking entities
Recommendations
Resolving polysemy and pseudonymity in entity linking with comprehensive name and context modeling
Names are important atomic information carriers in unstructured text. Matching names that refer to the same entities is an important issue in text analysis and a key component in many real world applications. Generally referred to as entity linking, it ...
Re-ranking for joint named-entity recognition and linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementRecognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of ...
Entity linking by focusing DBpedia candidate entities
ERD '14: Proceedings of the first international workshop on Entity recognition & disambiguationRecently, Entity Linking and Retrieval turned out to be one of the most interesting tasks in Information Extraction due to its various applications. Entity Linking (EL) is the task of detecting mentioned entities in a text and linking them to the ...
Comments