Abstract
Wikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in the Wikipedia corpus by their properties and interrelationships. An entity-relationship query consists of multiple predicates on desired entities. The semantics of each predicate is specified with keywords. Entity-relationship query searches entities directly over text instead of preextracted structured data stores. This characteristic brings two benefits: (1) Query semantics can be intuitively expressed by keywords; (2) It only requires rudimentary entity annotation, which is simpler than explicitly extracting and reasoning about complex semantic information before query-time. We present a ranking framework for general entity-relationship queries and a position-based Bounded Cumulative Model (BCM) for accurate ranking of query answers. We also explore various weighting schemes for further improving the accuracy of BCM. We test our ideas on a 2008 version of Wikipedia using a collection of 45 queries pooled from INEX entity ranking track and our own crafted queries. Experiments show that the ranking and weighting schemes are both effective, particularly on multipredicate queries.
- Agichtein, E. and Gravano, L. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM International Conference on Digital Libraries. 85--94. Google ScholarDigital Library
- Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. 2007. DBpedia: A nucleus for a Web of open data. In Proceedings of the 6th International Semantic Web Conference. 11--15. Google ScholarDigital Library
- Brin, S. 1998. Extracting patterns and relations from the world wide web. In Proceedings of the International Workshop on the World Wide Web and Databases. 172--183. Google ScholarDigital Library
- Cafarella, M. J., Ré, C., Suciu, D., Etzioni, O., and Banko, M. 2007. Structured querying of Web text: A technical challenge. In Proceedings of the Conference on Innovative Data Systems Research. 68--74.Google Scholar
- Cafarella, M. J., Halevy, A., Wang, D. Z., Wu, E., and Zhang, Y. 2008. WebTables: Exploring the power of tables on the web. Proc. VLDB Endowment 1, 1, 538--549. Google ScholarDigital Library
- Chakrabarti, S., Puniyani, K., and Das, S. 2006. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In Proceedings of the 15th International Conference on World Wide Web. 717--726. Google ScholarDigital Library
- Cheng, T., Yan, X., and Chang, K. C.-C. 2007. EntityRank: Searching entities directly and holistically. In Proceedings of the 33rd International Conference on Very Large Data Bases. 387--398. Google ScholarDigital Library
- Chu, E., Baid, A., Chen, T., Doan, A., and Naughton, J. 2007. A relational approach to incrementally extracting and querying structure in unstructured data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). 1045--1056. Google ScholarDigital Library
- Demartini, G., Firan, C. S., Iofciu, T., Krestel, R., and Nejdl, W. 2008. A model for ranking entities and its application to Wikipedia. In Proceedings of the Latin American Web Conference. 29--38. Google ScholarDigital Library
- DeRose, P., Shen, W., Chen, F., Doan, A., and Ramakrishnan, R. 2007. Building structured Web community portals: A top-down, compositional, and incremental approach. In Proceedings of the 33rd International Conference on Very Large Data Bases. 399--410. Google ScholarDigital Library
- Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., Rajagopalan, S., Tomkins, A., Tomlin, J. A., and Zien, J. Y. 2003. SemTag and seeker: Bootstrapping the semantic Web via automated semantic annotation. In Proceedings of the 12th International Conference on World Wide Web. 178--186. Google ScholarDigital Library
- Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. 2008. Open information extraction from the Web. Comm. ACM 51, 12, 68--74. Google ScholarDigital Library
- Kandogan, E., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., and Zhu, H. 2006. Avatar semantic search: A database approach to information retrieval. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 790--792. Google ScholarDigital Library
- Kasneci, G., Suchanek, F., Ifrim, G., Ramanath, M., and Weikum, G. 2008. NAGA: Searching and ranking knowledge. In Proceedings of the IEEE 24th International Conference on Data Engineering. 953--962. Google ScholarDigital Library
- Kulkarni, S., Singh, A., Ramakrishnan, G., and Chakrabarti, S. 2009. Collective annotation of Wikipedia entities in Web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 457--466. Google ScholarDigital Library
- Li, X., Li, C., and Yu, C. 2010a. Entity-relationship queries over Wikipedia. In Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents. 21--28. Google ScholarDigital Library
- Li, X., Li, C., and Yu, C. 2010b. Entityengine: Answering entity-relationship queries using shallow semantics. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (Demonstration Description). 1925--1926. Google ScholarDigital Library
- Mihalcea, R. and Csomai, A. 2007. Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM International Conference on Information and Knowledge Management. 233--242. Google ScholarDigital Library
- Milne, D. and Witten, I. H. 2008. Learning to link with Wikipedia. In Proceedings of the 17th ACM International Conference on Information and Knowledge Management. 509--518. Google ScholarDigital Library
- Nadeau, David, Sekine, and Satoshi. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1, 3--26.Google ScholarCross Ref
- Petkova, D. and Croft, W. B. 2007. Proximity-based document representation for named entity retrieval. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 731--740. Google ScholarDigital Library
- Suchanek, F. 2009. Automated construction and growth of a large ontology. Ph.D. thesis, Saarland University.Google Scholar
- Suchanek, F. M., Kasneci, G., and Weikum, G. 2007. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proceedings of the 16th International Conference on World Wide Web. 697--706. Google ScholarDigital Library
- Vercoustre, A.-M., Thom, J. A., and Pehcevski, J. 2008. Entity ranking in Wikipedia. In Proceedings of the ACM Symposium on Applied Computing. 1101--1106. Google ScholarDigital Library
- Voorhees, E. M. 2003. Overview of the trec 2003 question answering track. In Proceedings of thee 12th Text Retrieval Conference. 54--68.Google Scholar
- Zaragoza, H., Rode, H., Mika, P., Atserias, J., Ciaramita, M., and Attardi, G. 2007. Ranking very many typed entities on Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 1015--1018. Google ScholarDigital Library
- Zhou, M., Cheng, T., and Chang, K. C.-C. 2010. Data-oriented content query system: Searching for data into text on the web. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 121--130. Google ScholarDigital Library
Index Terms
- Entity-Relationship Queries over Wikipedia
Recommendations
Entity-relationship queries over wikipedia
SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contentsWikipedia is the largest user-generated knowledge base. We propose a structured query mechanism, entity-relationship query, for searching entities in Wikipedia corpus by their properties and inter-relationships. An entity-relationship query consists of ...
Entity ranking using Wikipedia as a pivot
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementIn this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant ...
EntityEngine: answering entity-relationship queries using shallow semantics
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementWe introduce EntityEngine, a system for answering entity-relationship queries over text. Such queries combine SQL-like structures with IR-style keyword constraints and therefore, can be expressive and flexible in querying about entities and their ...
Comments