skip to main content
10.1145/1859127.1859133acmconferencesArticle/Chapter ViewAbstractPublication PageswebdbConference Proceedingsconference-collections
research-article

Querying Wikipedia documents and relationships

Published: 06 June 2010 Publication History

Abstract

Wikipedia has become an important source of information which is growing very rapidly. However, the existing infrastructure for querying this information is limited and often ignores the inherent structure in the information and links across documents. In this paper, we present a new approach for querying Wikipedia content that supports a simple, yet expressive query interfaces that allow both keyword and structured queries. A unique feature of our approach is that, besides returning documents that match the queries, it also exploits relationships among documents to return richer, multi-document answers. We model Wikipedia as a graph and cast the problem of finding answers for queries as graph search. To guide the answer-search process, we propose a novel weighting scheme to identify important nodes and edges in the graph. By leveraging the structured information available in infoboxes, our approach supports queries that specify constraints over this structure, and we propose a new search algorithm to support these queries. We evaluate our approach using a representative subset of Wikipedia documents and present results which show that our approach is effective and derives high-quality answers.

References

[1]
}}S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, page 5, 2002.
[2]
}}S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC, pages 722--735, 2007.
[3]
}}S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In ESWC, pages 503--517, 2007.
[4]
}}R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press/Addison-Wesley, 1999.
[5]
}}A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In VLDB, pages 564--575, 2004.
[6]
}}G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, S. Sudarshan, and I. Bombay. Keyword searching and browsing in databases using banks. In ICDE, page 431, 2002.
[7]
}}B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, pages 836--845, 2007.
[8]
}}J. Graupmann, R. Schenkel, and G. Weikum. The spheresearch engine for unified ranked retrieval of heterogeneous XML and Web documents. In VLDB, pages 529--540, 2005.
[9]
}}H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: ranked keyword searches on graphs. In SIGMOD, pages 305--316, 2007.
[10]
}}V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In VLDB, pages 850--861, 2003.
[11]
}}V. Hristidis and Y. Papakonstantinou. Discover: keyword search in relational databases. In VLDB, pages 670--681, 2002.
[12]
}}K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In SIGIR, pages 41--48, 2000.
[13]
}}V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005.
[14]
}}G. Kasneci, M. Ramanath, M. Sozio, F. M. Suchanek, and G. Weikum. Star: Steiner-tree approximation in relationship graphs. In ICDE, pages 868--879, 2009.
[15]
}}G. Kasneci, F. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. NAGA: Searching and ranking knowledge. In ICDE, pages 953--962, 2008.
[16]
}}F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD, pages 563--574, 2006.
[17]
}}C. Yu and H. V. Jagadish. Schema summarization. In VLDB, pages 319--330, 2006.

Cited By

View all
  • (2020)Exploring the coming repositories of reproducible experimentsProceedings of the VLDB Endowment10.14778/3402755.34028044:12(1494-1497)Online publication date: 3-Jun-2020
  • (2012)Clustering Wikipedia infoboxes to discover their typesProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398588(2134-2138)Online publication date: 29-Oct-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebDB '10: Procceedings of the 13th International Workshop on the Web and Databases
June 2010
88 pages
ISBN:9781450301862
DOI:10.1145/1859127
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 30 of 100 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Exploring the coming repositories of reproducible experimentsProceedings of the VLDB Endowment10.14778/3402755.34028044:12(1494-1497)Online publication date: 3-Jun-2020
  • (2012)Clustering Wikipedia infoboxes to discover their typesProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398588(2134-2138)Online publication date: 29-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media