skip to main content
10.1145/2872427.2883061acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries

Published: 11 April 2016 Publication History

Abstract

In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH-2, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea underlying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 measure. We evaluate both known features, such as word embeddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art performance on the ERD@SIGIR2014 benchmark. We also publish GERDAQ (General Entity Recognition, Disambiguation and Annotation in Queries), a novel, public dataset built specifically for web-query entity linking via a crowdsourcing effort. SMAPH-2 outperforms the benchmarks by comparable margins also on GERDAQ.

References

[1]
A. Alasiry, M. Levene, A. Poulovassilis. Detecting candidate named entities in search queries. In SIGIR, 1049--1050, 2012.
[2]
M. Bendersky, W.B. Croft, D.A. Smith. Joint Annotation of Search Queries. In ACL-HLT, 1:102--111, 2011.
[3]
R. Blanco, G. Ottaviano, E. Meij. Fast and Space-Efficient Entity Linking for Queries. In WSDM, 179--188, 2015.
[4]
I. Bordino, G. De Francisci Morales, I. Weber, F. Bonchi. From Machu-Picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph. In WSDM, 275--284, 2013.
[5]
D. Carmel, M.-W. Chang, E. Gabrilovich, B.-J. Hsu, K. Wang. ERD 2014: Entity Recognition and Disambiguation Challenge. In SIGIR Forum, 2014.
[6]
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. In ACM Transactions on Intelligent Systems and Technology, 27:1--27:27, 2011.
[7]
Y.-P. Chiu, Y.-S. Shih, Y.-Y. Lee, C.-C. Shao, M.-L. Cai, S.-L. Wei, H.-H. Chen. NTUNLP Approaches to Recognizing and Disambiguating Entities in Long and Short Text in the 2014 ERD Challenge In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 3--12, 2014.
[8]
M. Cornolti, P. Ferragina, M. Ciaramita. A framework for benchmarking entity-annotation systems. In WWW, 249--260, 2013.
[9]
M. Cornolti, P. Ferragina, M. Ciaramita, S. Rüd, H. Schütze. The SMAPH system for query entity recognition and disambiguation. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by ACM SIGIR, 25--30, 2014.
[10]
J. Dalton, L. Dietz, J. Allan. Entity query expansion using knowledge base links. In SIGIR, 365--374, 2014
[11]
J. Du, Z. Zhang, J. Yan, Y. Cui, Z. Chen. Using search session context for named entity recognition in query. In SIGIR, 765--766, 2010.
[12]
A. Eckhardt, J. Hre\vsko, J. Procházka, O. Smrf. Entity linking based on the co-occurrence graph and entity probability. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 37--44, 2014.
[13]
A. Eiselt, A. Figueroa. A Two-Step Named Entity Recognizer for Open-Domain Search Queries. In IJNLP, 829--833, 2013.
[14]
P. Ferragina, U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1): 70--75, 2012. Also CIKM 2010.
[15]
E. Gabrilovich, S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Int. Res., 34(1):443--498, 2009.
[16]
J. Guo, G. Xu, X. Cheng, H. Li. Named Entity Recognition in Query. In SIGIR, 267--274, 2009.
[17]
Z. Guo, D. Barbosa. Robust entity linking via random walks. In CIKM, 499--508, 2014.
[18]
M. Hagen, M. Potthast, A. Beyer, B. Stein. Towards Optimum Query Segmentation: In Doubt Without. In CIKM, 1015--1024, 2012.
[19]
F. Hasibi, K. Balog, S. E. Bratsberg. A Greedy Algorithm for Finding Sets of Entity Linking Interpretations in Queries. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 75--78, 2014.
[20]
J. Hoffart, M. A. Yosef, et alii. Robust disambiguation of named entities in text. In EMNLP, 782--792, 2011.
[21]
A. Jain, M. Pennacchiotti. Domain-independent entity extraction from web search query logs. In WWW, 63--64, 2011.
[22]
R. Jones, B. Rey, O. Madani, W. Greiner. Generating query substitutions. In WWW, 387--396, 2006.
[23]
M. Joshi, U. Sawant, S. Chakrabarti. Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries. In EMNLP, 1104--1114, 2014.
[24]
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In KDD, 457--466, 2009.
[25]
X. Li. Understanding the semantic structure of noun phrase queries. In ACL, 1337--1345, 2010.
[26]
Y. Li, B.-J.P. Hsu, C. Zhai, K. Wang. Unsupervised query segmentation using clickthrough for information retrieval. In SIGIR, 285--294, 2011.
[27]
M. Manshadi, X. Li. Semantic tagging of web search queries. In ACL, 861--869, 2009.
[28]
E. Meij, K. Balog, D. Odijk. Entity Linking and Retrieval for Semantic Search. In WSDM, 683--684, 2014.
[29]
D. Milne and I. H. Witten. Learning to link with wikipedia. In CIKM, 509--518, 2008.
[30]
M. Pasca. Weakly-supervised discovery of named entities using web search queries. In CIKM, 683--690, 2007
[31]
F. Piccinno and P. Ferragina. From TagME to WAT: a new entity annotator. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 55--62, 2014.
[32]
L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL-HLT, 1375--1384, 2011.
[33]
K. Risvik, T. Mikolajewski, P. Boros. Query segmentation for web search. In WWW (poster), 2003.
[34]
S. Rüd, M. Ciaramita, J. Müller, and H. Schütze. Piggyback: using search engines for robust cross-domain named entity recognition. In ACL-HLT, 965--975, 2011.
[35]
U. Scaiella, P. Ferragina, A. Marino, M. Ciaramita. Topical clustering of search results. In WSDM, 223--232, 2012.
[36]
A. Sil, A. Yates. Re-ranking for Joint Named-Entity Recognition and Linking. In CIKM, 2369--2374, 2013.
[37]
S. Guo, M.-W. Chang, E. Kiciman. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking. In NAACL-HLT, 1020--1030, 2013
[38]
F. Suchanek and G. Weikum. Knowledge Harvesting in the Big-data Era. In SIGMOD, 933--938, 2013.
[39]
B. Tan, F. Peng. Unsupervised query segmentation using generative lannguage models and Wikipedia. In WWW, 347--356, 2008.
[40]
R. Usbeck, et alii. GERBIL: General Entity Annotator Benchmarking Framework. In WWW, 1133--1143, 2015.
[41]
X. Wei, F. Peng, B. Dumoulin. Analyzing web text association to disambiguate abbreviation in queries. In SIGIR, 751--752, 2008
[42]
Q. Wu, C. Burges, K. Svore, J. Gao. Ranking, boosting, and model adaptation. Technical report, Microsoft Research, 2008
[43]
X. Yin, S. Shah. Building taxonomy of web search intents for name entity queries. In WWW, 1001--1010, 2010.

Cited By

View all
  • (2022)Query Interpretations from Entity-Linked SegmentationsProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498532(449-457)Online publication date: 11-Feb-2022
  • (2022)Entity-aware Transformers for Entity SearchProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531971(1455-1465)Online publication date: 6-Jul-2022
  • (2022)Towards better entity linkingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0192-916:2Online publication date: 1-Apr-2022
  • Show More Cited By

Index Terms

  1. A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '16: Proceedings of the 25th International Conference on World Wide Web
    April 2016
    1482 pages
    ISBN:9781450341431

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 11 April 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity linking
    2. erd
    3. piggyback
    4. query annotation

    Qualifiers

    • Research-article

    Funding Sources

    • Google
    • EU H2020 Program

    Conference

    WWW '16
    Sponsor:
    • IW3C2
    WWW '16: 25th International World Wide Web Conference
    April 11 - 15, 2016
    Québec, Montréal, Canada

    Acceptance Rates

    WWW '16 Paper Acceptance Rate 115 of 727 submissions, 16%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Query Interpretations from Entity-Linked SegmentationsProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498532(449-457)Online publication date: 11-Feb-2022
    • (2022)Entity-aware Transformers for Entity SearchProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531971(1455-1465)Online publication date: 6-Jul-2022
    • (2022)Towards better entity linkingFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0192-916:2Online publication date: 1-Apr-2022
    • (2021)Entity Linking Meets Deep Learning: Techniques and SolutionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3117715(1-1)Online publication date: 2021
    • (2020)Information extraction meets the Semantic WebSemantic Web10.3233/SW-18033311:2(255-335)Online publication date: 1-Jan-2020
    • (2020)On Computing Entity Relatedness in Wikipedia, with ApplicationsKnowledge-Based Systems10.1016/j.knosys.2019.105051188:COnline publication date: 5-Jan-2020
    • (2020)HEEL: exploratory entity linking for heterogeneous information networksKnowledge and Information Systems10.1007/s10115-019-01354-162:2(485-506)Online publication date: 1-Feb-2020
    • (2020)Joint Word and Entity Embeddings for Entity Retrieval from a Knowledge GraphAdvances in Information Retrieval10.1007/978-3-030-45439-5_10(141-155)Online publication date: 14-Apr-2020
    • (2019)ConCETProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358048(1371-1380)Online publication date: 3-Nov-2019
    • (2019)Joint Entity Linking with Deep Reinforcement LearningThe World Wide Web Conference10.1145/3308558.3313517(438-447)Online publication date: 13-May-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media