skip to main content
10.1145/3132218.3132234acmotherconferencesArticle/Chapter ViewAbstractPublication PagessemanticsConference Proceedingsconference-collections
research-article

SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels

Published: 11 September 2017 Publication History

Abstract

Webpages are an abundant source of textual information with manually annotated entity links, and are often used as a source of training data for a wide variety of machine learning NLP tasks. However, manual annotations such as those found on Wikipedia are sparse, noisy, and biased towards popular entities. Existing entity linking systems deal with those issues by relying on simple statistics extracted from the data. While such statistics can effectively deal with noisy annotations, they introduce bias towards head entities and are ineffective for long tail (e.g., unpopular) entities. In this work, we first analyze statistical properties linked to manual annotations by studying a large annotated corpus composed of all English Wikipedia webpages, in addition to all pages from the CommonCrawl containing English Wikipedia annotations. We then propose and evaluate a series of entity linking approaches, with the explicit goal of creating highly-accurate (precision > 95%) and broad annotated corpuses for machine learning tasks. Our results show that our best approach achieves maximal-precision at usable recall levels, and outperforms both state-of-the-art entity-linking systems and human annotators.

References

[1]
R. Blanco, G. Ottaviano, and E. Meij. Fast and space-efficient entity linking for queries. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 179--188. ACM, 2015.
[2]
M. Ciaramita and Y. Altun. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP '06, pages 594--602, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics.
[3]
S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 28--30, 2007, Prague, Czech Republic, pages 708--716, 2007.
[4]
P. Cudré-Mauroux, P. Haghani, M. Jost, K. Aberer, and H. De Meer. idMesh: Graph-based disambiguation of linked data. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 591--600, New York, NY, USA, 2009. ACM.
[5]
G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st International Conference on World Wide Web, WWW '12, pages 469--478, New York, NY, USA, 2012. ACM.
[6]
O. Etzioni, M. Banko, S. Soderland, and D. S. Weld. Open information extraction from the web. Communications ACM, 51(12):68--74, Dec. 2008.
[7]
P. Ferragina and U. Scaiella. TAGME: On-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM '10, pages 1625--1628, New York, NY, USA, 2010. ACM.
[8]
O.-E. Ganea, M. Ganea, A. Lucchi, C. Eickhoff, and T. Hofmann. Probabilistic Bag-Of-Hyperlinks Model for Entity Linking. In Proceedings of the 25th International Conference on World Wide Web, WWW'16, pages 927--938, Republic and Canton of Geneva, Switzerland, 2016. International World Wide Web Conferences Steering Committee.
[9]
Z. Guo and D. Barbosa. Robust Entity Linking via Random Walks. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, CIKM '14, pages 499--508, New York, NY, USA, 2014. ACM.
[10]
X. Han and L. Sun. A generative entity-mention model for linking entities with knowledge base. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11, pages 945--954, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
[11]
X. Han and J. Zhao. Named entity disambiguation by leveraging wikipedia semantic knowledge. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM '09, pages 215--224, New York, NY, USA, 2009. ACM.
[12]
S. S. Kataria, K. S. Kumar, R. R. Rastogi, P. Sen, and S. H. Sengamedu. Entity disambiguation with hierarchical topic models. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 1037--1045, New York, NY, USA, 2011. ACM.
[13]
E. Meij, K. Balog, and D. Odijk. Entity linking and retrieval. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '13, pages 1127--1127, New York, NY, USA, 2013. ACM.
[14]
E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM '12, pages 563--572, New York, NY, USA, 2012. ACM.
[15]
P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia spotlight: Shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, I-Semantics '11, pages 1--8, New York, NY, USA, 2011. ACM.
[16]
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 233--242. ACM, 2007.
[17]
D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, pages 509--518, New York, NY, USA, 2008. ACM.
[18]
A. Moro, A. Raganato, and R. Navigli. Entity linking meets word sense disambiguation: A unified approach. Transactions of the Association for Computational Linguistics, 2, 2014.
[19]
R. Prokofyev, G. Demartini, and P. Cudré-Mauroux. Effective named entity recognition for idiosyncratic web collections. In Proceedings of the 23rd International Conference on World Wide Web, WWW '14, pages 397--408, New York, NY, USA, 2014. ACM.
[20]
A. Tonon, G. Demartini, and P. Cudré-Mauroux. Pooling-based continuous evaluation of information retrieval systems. Information Retrieval, 18(5):445--472, Oct. 2015.
[21]
R. Usbeck, M. Röder, A.-C. Ngonga Ngomo, C. Baron, A. Both, M. Brümmer, D. Ceccarelli, M. Cornolti, D. Cherix, B. Eickmann, et al. Gerbil: general entity annotator benchmarking framework. In Proceedings of the 24th International Conference on World Wide Web, pages 1133--1143. International World Wide Web Conferences Steering Committee, 2015.

Cited By

View all
  • (2021)Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop ApproachProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481939(3818-3827)Online publication date: 26-Oct-2021
  1. SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous Labels

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    Semantics2017: Proceedings of the 13th International Conference on Semantic Systems
    September 2017
    202 pages
    ISBN:9781450352963
    DOI:10.1145/3132218
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    In-Cooperation

    • St. Pölten University: St. Pölten University of Applied Sciences, Austria
    • Wolters Kluwer: Wolters Kluwer, Germany
    • Vrije Universeit Amsterdam: Vrije Universeit Amsterdam
    • Semantic Web Company: Semantic Web Company
    • Uinv. Leipzig: Universität Leipzig

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 September 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Entity Linking
    2. Machine learning
    3. Manual annotations

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    Semantics2017

    Acceptance Rates

    Overall Acceptance Rate 40 of 182 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop ApproachProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481939(3818-3827)Online publication date: 26-Oct-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media