skip to main content
10.1145/2872427.2883061acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries

Published:11 April 2016Publication History

ABSTRACT

In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH-2, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea underlying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 measure. We evaluate both known features, such as word embeddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art performance on the ERD@SIGIR2014 benchmark. We also publish GERDAQ (General Entity Recognition, Disambiguation and Annotation in Queries), a novel, public dataset built specifically for web-query entity linking via a crowdsourcing effort. SMAPH-2 outperforms the benchmarks by comparable margins also on GERDAQ.

References

  1. A. Alasiry, M. Levene, A. Poulovassilis. Detecting candidate named entities in search queries. In SIGIR, 1049--1050, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Bendersky, W.B. Croft, D.A. Smith. Joint Annotation of Search Queries. In ACL-HLT, 1:102--111, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Blanco, G. Ottaviano, E. Meij. Fast and Space-Efficient Entity Linking for Queries. In WSDM, 179--188, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Bordino, G. De Francisci Morales, I. Weber, F. Bonchi. From Machu-Picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph. In WSDM, 275--284, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Carmel, M.-W. Chang, E. Gabrilovich, B.-J. Hsu, K. Wang. ERD 2014: Entity Recognition and Disambiguation Challenge. In SIGIR Forum, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. In ACM Transactions on Intelligent Systems and Technology, 27:1--27:27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y.-P. Chiu, Y.-S. Shih, Y.-Y. Lee, C.-C. Shao, M.-L. Cai, S.-L. Wei, H.-H. Chen. NTUNLP Approaches to Recognizing and Disambiguating Entities in Long and Short Text in the 2014 ERD Challenge In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 3--12, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Cornolti, P. Ferragina, M. Ciaramita. A framework for benchmarking entity-annotation systems. In WWW, 249--260, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Cornolti, P. Ferragina, M. Ciaramita, S. Rüd, H. Schütze. The SMAPH system for query entity recognition and disambiguation. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by ACM SIGIR, 25--30, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Dalton, L. Dietz, J. Allan. Entity query expansion using knowledge base links. In SIGIR, 365--374, 2014 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Du, Z. Zhang, J. Yan, Y. Cui, Z. Chen. Using search session context for named entity recognition in query. In SIGIR, 765--766, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Eckhardt, J. Hre\vsko, J. Procházka, O. Smrf. Entity linking based on the co-occurrence graph and entity probability. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 37--44, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Eiselt, A. Figueroa. A Two-Step Named Entity Recognizer for Open-Domain Search Queries. In IJNLP, 829--833, 2013.Google ScholarGoogle Scholar
  14. P. Ferragina, U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1): 70--75, 2012. Also CIKM 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Gabrilovich, S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. J. Artif. Int. Res., 34(1):443--498, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Guo, G. Xu, X. Cheng, H. Li. Named Entity Recognition in Query. In SIGIR, 267--274, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. Guo, D. Barbosa. Robust entity linking via random walks. In CIKM, 499--508, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Hagen, M. Potthast, A. Beyer, B. Stein. Towards Optimum Query Segmentation: In Doubt Without. In CIKM, 1015--1024, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Hasibi, K. Balog, S. E. Bratsberg. A Greedy Algorithm for Finding Sets of Entity Linking Interpretations in Queries. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 75--78, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Hoffart, M. A. Yosef, et alii. Robust disambiguation of named entities in text. In EMNLP, 782--792, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Jain, M. Pennacchiotti. Domain-independent entity extraction from web search query logs. In WWW, 63--64, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Jones, B. Rey, O. Madani, W. Greiner. Generating query substitutions. In WWW, 387--396, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Joshi, U. Sawant, S. Chakrabarti. Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries. In EMNLP, 1104--1114, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of Wikipedia entities in web text. In KDD, 457--466, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Li. Understanding the semantic structure of noun phrase queries. In ACL, 1337--1345, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Li, B.-J.P. Hsu, C. Zhai, K. Wang. Unsupervised query segmentation using clickthrough for information retrieval. In SIGIR, 285--294, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Manshadi, X. Li. Semantic tagging of web search queries. In ACL, 861--869, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Meij, K. Balog, D. Odijk. Entity Linking and Retrieval for Semantic Search. In WSDM, 683--684, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Milne and I. H. Witten. Learning to link with wikipedia. In CIKM, 509--518, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Pasca. Weakly-supervised discovery of named entities using web search queries. In CIKM, 683--690, 2007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. Piccinno and P. Ferragina. From TagME to WAT: a new entity annotator. In Workshop on Entity Recognition & Disambiguation (ERD), hosted by SIGIR, 55--62, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL-HLT, 1375--1384, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Risvik, T. Mikolajewski, P. Boros. Query segmentation for web search. In WWW (poster), 2003.Google ScholarGoogle Scholar
  34. S. Rüd, M. Ciaramita, J. Müller, and H. Schütze. Piggyback: using search engines for robust cross-domain named entity recognition. In ACL-HLT, 965--975, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. U. Scaiella, P. Ferragina, A. Marino, M. Ciaramita. Topical clustering of search results. In WSDM, 223--232, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Sil, A. Yates. Re-ranking for Joint Named-Entity Recognition and Linking. In CIKM, 2369--2374, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Guo, M.-W. Chang, E. Kiciman. To Link or Not to Link? A Study on End-to-End Tweet Entity Linking. In NAACL-HLT, 1020--1030, 2013Google ScholarGoogle Scholar
  38. F. Suchanek and G. Weikum. Knowledge Harvesting in the Big-data Era. In SIGMOD, 933--938, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Tan, F. Peng. Unsupervised query segmentation using generative lannguage models and Wikipedia. In WWW, 347--356, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Usbeck, et alii. GERBIL: General Entity Annotator Benchmarking Framework. In WWW, 1133--1143, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. X. Wei, F. Peng, B. Dumoulin. Analyzing web text association to disambiguate abbreviation in queries. In SIGIR, 751--752, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Q. Wu, C. Burges, K. Svore, J. Gao. Ranking, boosting, and model adaptation. Technical report, Microsoft Research, 2008Google ScholarGoogle Scholar
  43. X. Yin, S. Shah. Building taxonomy of web search intents for name entity queries. In WWW, 1001--1010, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '16: Proceedings of the 25th International Conference on World Wide Web
      April 2016
      1482 pages
      ISBN:9781450341431

      Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 11 April 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader