ABSTRACT
In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH-2, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea underlying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 measure. We evaluate both known features, such as word embeddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art performance on the ERD@SIGIR2014 benchmark. We also publish GERDAQ (General Entity Recognition, Disambiguation and Annotation in Queries), a novel, public dataset built specifically for web-query entity linking via a crowdsourcing effort. SMAPH-2 outperforms the benchmarks by comparable margins also on GERDAQ.
- A. Alasiry, M. Levene, A. Poulovassilis. Detecting candidate named entities in search queries. In SIGIR, 1049--1050, 2012. Google ScholarDigital Library
- M. Bendersky, W.B. Croft, D.A. Smith. Joint Annotation of Search Queries. In ACL-HLT, 1:102--111, 2011. Google ScholarDigital Library
- R. Blanco, G. Ottaviano, E. Meij. Fast and Space-Efficient Entity Linking for Queries. In WSDM, 179--188, 2015. Google ScholarDigital Library
- I. Bordino, G. De Francisci Morales, I. Weber, F. Bonchi. From Machu-Picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph. In WSDM, 275--284, 2013. Google ScholarDigital Library
- D. Carmel, M.-W. Chang, E. Gabrilovich, B.-J. Hsu, K. Wang. ERD 2014: Entity Recognition and Disambiguation Challenge. In SIGIR Forum, 2014. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. In ACM Transactions on Intelligent Systems and Technology, 27:1--27:27, 2011. Google ScholarDigital Library
- Y.-P. Chiu, Y.-S. Shih, Y.-Y. Lee, C.-C. Shao, M.-L. Cai, S.-L. Wei, H.-H. Chen. NTUNLP Approaches to Recognizing and Disambiguating Entities in Long and Short Text in the 2014 ERD Challenge In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 3--12, 2014. Google ScholarDigital Library
- M. Cornolti, P. Ferragina, M. Ciaramita. A framework for benchmarking entity-annotation systems. In WWW, 249--260, 2013. Google ScholarDigital Library
- M. Cornolti, P. Ferragina, M. Ciaramita, S. Rüd, H. Schütze. The SMAPH system for query entity recognition and disambiguation. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by ACM SIGIR, 25--30, 2014. Google ScholarDigital Library
- J. Dalton, L. Dietz, J. Allan. Entity query expansion using knowledge base links. In SIGIR, 365--374, 2014 Google ScholarDigital Library
- J. Du, Z. Zhang, J. Yan, Y. Cui, Z. Chen. Using search session context for named entity recognition in query. In SIGIR, 765--766, 2010. Google ScholarDigital Library
- A. Eckhardt, J. Hre\vsko, J. Procházka, O. Smrf. Entity linking based on the co-occurrence graph and entity probability. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 37--44, 2014. Google ScholarDigital Library
- A. Eiselt, A. Figueroa. A Two-Step Named Entity Recognizer for Open-Domain Search Queries. In IJNLP, 829--833, 2013.Google Scholar
- P. Ferragina, U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1): 70--75, 2012. Also CIKM 2010. Google ScholarDigital Library
- E. Gabrilovich, S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Int. Res., 34(1):443--498, 2009. Google ScholarDigital Library
- J. Guo, G. Xu, X. Cheng, H. Li. Named Entity Recognition in Query. In SIGIR, 267--274, 2009. Google ScholarDigital Library
- Z. Guo, D. Barbosa. Robust entity linking via random walks. In CIKM, 499--508, 2014. Google ScholarDigital Library
- M. Hagen, M. Potthast, A. Beyer, B. Stein. Towards Optimum Query Segmentation: In Doubt Without. In CIKM, 1015--1024, 2012. Google ScholarDigital Library
- F. Hasibi, K. Balog, S. E. Bratsberg. A Greedy Algorithm for Finding Sets of Entity Linking Interpretations in Queries. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 75--78, 2014. Google ScholarDigital Library
- J. Hoffart, M. A. Yosef, et alii. Robust disambiguation of named entities in text. In EMNLP, 782--792, 2011. Google ScholarDigital Library
- A. Jain, M. Pennacchiotti. Domain-independent entity extraction from web search query logs. In WWW, 63--64, 2011. Google ScholarDigital Library
- R. Jones, B. Rey, O. Madani, W. Greiner. Generating query substitutions. In WWW, 387--396, 2006. Google ScholarDigital Library
- M. Joshi, U. Sawant, S. Chakrabarti. Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries. In EMNLP, 1104--1114, 2014. Google ScholarCross Ref
- S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In KDD, 457--466, 2009. Google ScholarDigital Library
- X. Li. Understanding the semantic structure of noun phrase queries. In ACL, 1337--1345, 2010. Google ScholarDigital Library
- Y. Li, B.-J.P. Hsu, C. Zhai, K. Wang. Unsupervised query segmentation using clickthrough for information retrieval. In SIGIR, 285--294, 2011. Google ScholarDigital Library
- M. Manshadi, X. Li. Semantic tagging of web search queries. In ACL, 861--869, 2009. Google ScholarDigital Library
- E. Meij, K. Balog, D. Odijk. Entity Linking and Retrieval for Semantic Search. In WSDM, 683--684, 2014. Google ScholarDigital Library
- D. Milne and I. H. Witten. Learning to link with wikipedia. In CIKM, 509--518, 2008. Google ScholarDigital Library
- M. Pasca. Weakly-supervised discovery of named entities using web search queries. In CIKM, 683--690, 2007 Google ScholarDigital Library
- F. Piccinno and P. Ferragina. From TagME to WAT: a new entity annotator. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 55--62, 2014. Google ScholarDigital Library
- L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL-HLT, 1375--1384, 2011. Google ScholarDigital Library
- K. Risvik, T. Mikolajewski, P. Boros. Query segmentation for web search. In WWW (poster), 2003.Google Scholar
- S. Rüd, M. Ciaramita, J. Müller, and H. Schütze. Piggyback: using search engines for robust cross-domain named entity recognition. In ACL-HLT, 965--975, 2011. Google ScholarDigital Library
- U. Scaiella, P. Ferragina, A. Marino, M. Ciaramita. Topical clustering of search results. In WSDM, 223--232, 2012. Google ScholarDigital Library
- A. Sil, A. Yates. Re-ranking for Joint Named-Entity Recognition and Linking. In CIKM, 2369--2374, 2013. Google ScholarDigital Library
- S. Guo, M.-W. Chang, E. Kiciman. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking. In NAACL-HLT, 1020--1030, 2013Google Scholar
- F. Suchanek and G. Weikum. Knowledge Harvesting in the Big-data Era. In SIGMOD, 933--938, 2013. Google ScholarDigital Library
- B. Tan, F. Peng. Unsupervised query segmentation using generative lannguage models and Wikipedia. In WWW, 347--356, 2008. Google ScholarDigital Library
- R. Usbeck, et alii. GERBIL: General Entity Annotator Benchmarking Framework. In WWW, 1133--1143, 2015. Google ScholarDigital Library
- X. Wei, F. Peng, B. Dumoulin. Analyzing web text association to disambiguate abbreviation in queries. In SIGIR, 751--752, 2008 Google ScholarDigital Library
- Q. Wu, C. Burges, K. Svore, J. Gao. Ranking, boosting, and model adaptation. Technical report, Microsoft Research, 2008Google Scholar
- X. Yin, S. Shah. Building taxonomy of web search intents for name entity queries. In WWW, 1001--1010, 2010. Google ScholarDigital Library
Index Terms
- A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries
Recommendations
Exploiting Entity Linking in Queries for Entity Retrieval
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information RetrievalThe premise of entity retrieval is to better answer search queries by returning specific entities instead of documents. Many queries mention particular entities; recognizing and linking them to the corresponding entry in a knowledge base is known as the ...
SMAPH: A Piggyback Approach for Entity-Linking in Web Queries
We study the problem of linking the terms of a web-search query to a semantic representation given by the set of entities (a.k.a. concepts) mentioned in it. We introduce SMAPH, a system that performs this task using the information coming from a web ...
Re-ranking for joint named-entity recognition and linking
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementRecognizing names and linking them to structured data is a fundamental task in text analysis. Existing approaches typically perform these two steps using a pipeline architecture: they use a Named-Entity Recognition (NER) system to find the boundaries of ...
Comments