skip to main content
research-article

Entity-Relationship Queries over Wikipedia

Published:01 September 2012Publication History
Skip Abstract Section

Abstract

Wikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in the Wikipedia corpus by their properties and interrelationships. An entity-relationship query consists of multiple predicates on desired entities. The semantics of each predicate is specified with keywords. Entity-relationship query searches entities directly over text instead of preextracted structured data stores. This characteristic brings two benefits: (1) Query semantics can be intuitively expressed by keywords; (2) It only requires rudimentary entity annotation, which is simpler than explicitly extracting and reasoning about complex semantic information before query-time. We present a ranking framework for general entity-relationship queries and a position-based Bounded Cumulative Model (BCM) for accurate ranking of query answers. We also explore various weighting schemes for further improving the accuracy of BCM. We test our ideas on a 2008 version of Wikipedia using a collection of 45 queries pooled from INEX entity ranking track and our own crafted queries. Experiments show that the ranking and weighting schemes are both effective, particularly on multipredicate queries.

References

  1. Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries. 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. 2007. DBpedia: A nucleus for a Web of open data. In Proceedings of the 6th International Semantic Web Conference. 11--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brin, S. 1998. Extracting patterns and relations from the world wide web. In Proceedings of the International Workshop on the World Wide Web and Databases. 172--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cafarella, M. J., Ré, C., Suciu, D., Etzioni, O., and Banko, M. 2007. Structured querying of Web text: A technical challenge. In Proceedings of the Conference on Innovative Data Systems Research. 68--74.Google ScholarGoogle Scholar
  5. Cafarella, M. J., Halevy, A., Wang, D. Z., Wu, E., and Zhang, Y. 2008. WebTables: Exploring the power of tables on the web. Proc. VLDB Endowment 1, 1, 538--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chakrabarti, S., Puniyani, K., and Das, S. 2006. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In Proceedings of the 15th International Conference on World Wide Web. 717--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cheng, T., Yan, X., and Chang, K. C.-C. 2007. EntityRank: Searching entities directly and holistically. In Proceedings of the 33rd International Conference on Very Large Data Bases. 387--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chu, E., Baid, A., Chen, T., Doan, A., and Naughton, J. 2007. A relational approach to incrementally extracting and querying structure in unstructured data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). 1045--1056. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Demartini, G., Firan, C. S., Iofciu, T., Krestel, R., and Nejdl, W. 2008. A model for ranking entities and its application to Wikipedia. In Proceedings of the Latin American Web Conference. 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. DeRose, P., Shen, W., Chen, F., Doan, A., and Ramakrishnan, R. 2007. Building structured Web community portals: A top-down, compositional, and incremental approach. In Proceedings of the 33rd International Conference on Very Large Data Bases. 399--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J. A., and Zien, J. Y. 2003. SemTag and seeker: Bootstrapping the semantic Web via automated semantic annotation. In Proceedings of the 12th International Conference on World Wide Web. 178--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. 2008. Open information extraction from the Web. Comm. ACM 51, 12, 68--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kandogan, E., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., and Zhu, H. 2006. Avatar semantic search: A database approach to information retrieval. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 790--792. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kasneci, G., Suchanek, F., Ifrim, G., Ramanath, M., and Weikum, G. 2008. NAGA: Searching and ranking knowledge. In Proceedings of the IEEE 24th International Conference on Data Engineering. 953--962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kulkarni, S., Singh, A., Ramakrishnan, G., and Chakrabarti, S. 2009. Collective annotation of Wikipedia entities in Web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 457--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Li, X., Li, C., and Yu, C. 2010a. Entity-relationship queries over Wikipedia. In Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents. 21--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Li, X., Li, C., and Yu, C. 2010b. Entityengine: Answering entity-relationship queries using shallow semantics. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (Demonstration Description). 1925--1926. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mihalcea, R. and Csomai, A. 2007. Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management. 233--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Milne, D. and Witten, I. H. 2008. Learning to link with Wikipedia. In Proceedings of the 17th ACM International Conference on Information and Knowledge Management. 509--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nadeau, David, Sekine, and Satoshi. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1, 3--26.Google ScholarGoogle ScholarCross RefCross Ref
  21. Petkova, D. and Croft, W. B. 2007. Proximity-based document representation for named entity retrieval. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 731--740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Suchanek, F. 2009. Automated construction and growth of a large ontology. Ph.D. thesis, Saarland University.Google ScholarGoogle Scholar
  23. Suchanek, F. M., Kasneci, G., and Weikum, G. 2007. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of the 16th International Conference on World Wide Web. 697--706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Vercoustre, A.-M., Thom, J. A., and Pehcevski, J. 2008. Entity ranking in Wikipedia. In Proceedings of the ACM Symposium on Applied Computing. 1101--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Voorhees, E. M. 2003. Overview of the trec 2003 question answering track. In Proceedings of thee 12th Text Retrieval Conference. 54--68.Google ScholarGoogle Scholar
  26. Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., and Attardi, G. 2007. Ranking very many typed entities on Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 1015--1018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zhou, M., Cheng, T., and Chang, K. C.-C. 2010. Data-oriented content query system: Searching for data into text on the web. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 121--130. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Entity-Relationship Queries over Wikipedia

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 4
      September 2012
      410 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2337542
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2012
      • Accepted: 1 May 2011
      • Revised: 1 March 2011
      • Received: 1 December 2010
      Published in tist Volume 3, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader