skip to main content
research-article

Entity-Relationship Queries over Wikipedia

Published: 01 September 2012 Publication History

Abstract

Wikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in the Wikipedia corpus by their properties and interrelationships. An entity-relationship query consists of multiple predicates on desired entities. The semantics of each predicate is specified with keywords. Entity-relationship query searches entities directly over text instead of preextracted structured data stores. This characteristic brings two benefits: (1) Query semantics can be intuitively expressed by keywords; (2) It only requires rudimentary entity annotation, which is simpler than explicitly extracting and reasoning about complex semantic information before query-time. We present a ranking framework for general entity-relationship queries and a position-based Bounded Cumulative Model (BCM) for accurate ranking of query answers. We also explore various weighting schemes for further improving the accuracy of BCM. We test our ideas on a 2008 version of Wikipedia using a collection of 45 queries pooled from INEX entity ranking track and our own crafted queries. Experiments show that the ranking and weighting schemes are both effective, particularly on multipredicate queries.

References

[1]
Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries. 85--94.
[2]
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. 2007. DBpedia: A nucleus for a Web of open data. In Proceedings of the 6th International Semantic Web Conference. 11--15.
[3]
Brin, S. 1998. Extracting patterns and relations from the world wide web. In Proceedings of the International Workshop on the World Wide Web and Databases. 172--183.
[4]
Cafarella, M. J., Ré, C., Suciu, D., Etzioni, O., and Banko, M. 2007. Structured querying of Web text: A technical challenge. In Proceedings of the Conference on Innovative Data Systems Research. 68--74.
[5]
Cafarella, M. J., Halevy, A., Wang, D. Z., Wu, E., and Zhang, Y. 2008. WebTables: Exploring the power of tables on the web. Proc. VLDB Endowment 1, 1, 538--549.
[6]
Chakrabarti, S., Puniyani, K., and Das, S. 2006. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In Proceedings of the 15th International Conference on World Wide Web. 717--726.
[7]
Cheng, T., Yan, X., and Chang, K. C.-C. 2007. EntityRank: Searching entities directly and holistically. In Proceedings of the 33rd International Conference on Very Large Data Bases. 387--398.
[8]
Chu, E., Baid, A., Chen, T., Doan, A., and Naughton, J. 2007. A relational approach to incrementally extracting and querying structure in unstructured data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). 1045--1056.
[9]
Demartini, G., Firan, C. S., Iofciu, T., Krestel, R., and Nejdl, W. 2008. A model for ranking entities and its application to Wikipedia. In Proceedings of the Latin American Web Conference. 29--38.
[10]
DeRose, P., Shen, W., Chen, F., Doan, A., and Ramakrishnan, R. 2007. Building structured Web community portals: A top-down, compositional, and incremental approach. In Proceedings of the 33rd International Conference on Very Large Data Bases. 399--410.
[11]
Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J. A., and Zien, J. Y. 2003. SemTag and seeker: Bootstrapping the semantic Web via automated semantic annotation. In Proceedings of the 12th International Conference on World Wide Web. 178--186.
[12]
Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. 2008. Open information extraction from the Web. Comm. ACM 51, 12, 68--74.
[13]
Kandogan, E., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., and Zhu, H. 2006. Avatar semantic search: A database approach to information retrieval. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 790--792.
[14]
Kasneci, G., Suchanek, F., Ifrim, G., Ramanath, M., and Weikum, G. 2008. NAGA: Searching and ranking knowledge. In Proceedings of the IEEE 24th International Conference on Data Engineering. 953--962.
[15]
Kulkarni, S., Singh, A., Ramakrishnan, G., and Chakrabarti, S. 2009. Collective annotation of Wikipedia entities in Web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 457--466.
[16]
Li, X., Li, C., and Yu, C. 2010a. Entity-relationship queries over Wikipedia. In Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents. 21--28.
[17]
Li, X., Li, C., and Yu, C. 2010b. Entityengine: Answering entity-relationship queries using shallow semantics. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (Demonstration Description). 1925--1926.
[18]
Mihalcea, R. and Csomai, A. 2007. Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management. 233--242.
[19]
Milne, D. and Witten, I. H. 2008. Learning to link with Wikipedia. In Proceedings of the 17th ACM International Conference on Information and Knowledge Management. 509--518.
[20]
Nadeau, David, Sekine, and Satoshi. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1, 3--26.
[21]
Petkova, D. and Croft, W. B. 2007. Proximity-based document representation for named entity retrieval. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 731--740.
[22]
Suchanek, F. 2009. Automated construction and growth of a large ontology. Ph.D. thesis, Saarland University.
[23]
Suchanek, F. M., Kasneci, G., and Weikum, G. 2007. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of the 16th International Conference on World Wide Web. 697--706.
[24]
Vercoustre, A.-M., Thom, J. A., and Pehcevski, J. 2008. Entity ranking in Wikipedia. In Proceedings of the ACM Symposium on Applied Computing. 1101--1106.
[25]
Voorhees, E. M. 2003. Overview of the trec 2003 question answering track. In Proceedings of thee 12th Text Retrieval Conference. 54--68.
[26]
Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., and Attardi, G. 2007. Ranking very many typed entities on Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 1015--1018.
[27]
Zhou, M., Cheng, T., and Chang, K. C.-C. 2010. Data-oriented content query system: Searching for data into text on the web. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 121--130.

Cited By

View all
  • (2021)Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled DataNew Generation Computing10.1007/s00354-021-00126-239:2(341-376)Online publication date: 1-Aug-2021
  • (2019)Incremental Market Behavior Classification in Presence of Recurring ConceptsEntropy10.3390/e2101002521:1(25)Online publication date: 1-Jan-2019
  • (2018)Activity Recognition with Evolving Data StreamsACM Computing Surveys10.1145/315864551:4(1-36)Online publication date: 6-Jul-2018
  • Show More Cited By

Index Terms

  1. Entity-Relationship Queries over Wikipedia

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 4
    September 2012
    410 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/2337542
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 September 2012
    Accepted: 01 May 2011
    Revised: 01 March 2011
    Received: 01 December 2010
    Published in TIST Volume 3, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Entity search and ranking
    2. Wikipedia
    3. structured entity query

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Recurring Drift Detection and Model Selection-Based Ensemble Classification for Data Streams with Unlabeled DataNew Generation Computing10.1007/s00354-021-00126-239:2(341-376)Online publication date: 1-Aug-2021
    • (2019)Incremental Market Behavior Classification in Presence of Recurring ConceptsEntropy10.3390/e2101002521:1(25)Online publication date: 1-Jan-2019
    • (2018)Activity Recognition with Evolving Data StreamsACM Computing Surveys10.1145/315864551:4(1-36)Online publication date: 6-Jul-2018
    • (2017)RELinkProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080756(1273-1276)Online publication date: 7-Aug-2017
    • (2016)Exploratory querying of extended knowledge graphsProceedings of the VLDB Endowment10.14778/3007263.30072999:13(1521-1524)Online publication date: 1-Sep-2016
    • (2016)Relationship Queries on Extended Knowledge GraphsProceedings of the Ninth ACM International Conference on Web Search and Data Mining10.1145/2835776.2835795(605-614)Online publication date: 8-Feb-2016
    • (2016)A SVR-based ensemble approach for drifting data streams with recurring patternsApplied Soft Computing10.1016/j.asoc.2016.06.03047:C(553-564)Online publication date: 1-Oct-2016
    • (2016)Separating Wheat from the Chaff – A Relationship Ranking AlgorithmThe Semantic Web10.1007/978-3-319-47602-5_17(79-83)Online publication date: 20-Oct-2016
    • (2015)When temporal expressions help to detect vital documents related to an entityACM SIGAPP Applied Computing Review10.1145/2835260.283526315:3(49-58)Online publication date: 13-Oct-2015
    • (2015)Leveraging temporal expressions to filter vital documents related to an entityProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695910(1093-1098)Online publication date: 13-Apr-2015
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media