skip to main content
10.1145/1498759.1498807acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Wikipedia pages as entry points for book search

Published: 09 February 2009 Publication History

Abstract

A lot of the world's knowledge is stored in books, which, as a result of recent mass-digitisation efforts, are increasingly available online. Search engines, such as Google Books, provide mechanisms for searchers to enter this vast knowledge space using queries as entry points. In this paper, we view Wikipedia as a summary of this world knowledge and aim to use this resource to guide users to relevant books. Thus, we investigate possible ways of using Wikipedia as an intermediary between the user's query and a collection of books being searched. We experiment with traditional query expansion techniques, exploiting Wikipedia articles as rich sources of information that can augment the user's query. We then propose a novel approach based on link distance in an extended Wikipedia graph: we associate books with Wikipedia pages that cite these books and use the link distance between these nodes and the pages that match the user query as an estimation of a book's relevance to the query. Our results show that a) classical query expansion using terms extracted from query pages leads to increased precision, and b) link distance between query and book pages in Wikipedia provides a good indicator of relevance that can boost the retrieval score of relevant books in the result ranking of a book search engine.

References

[1]
N. Abdullah and F. Gibb. Using a Task-Based Approach in Evaluating the Usability of BoBIs in an e-Book Environment. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume Lecture Notes in omputer Science, Vol. 4956, pages 246--257. Springer-Verlag, 2008.
[2]
J. Arguello, J. L. Elsas, J. Callan, and J. G. Carbonell. Document representation and query expansion models for blog recommendation. In Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008) 2008, 2008.
[3]
F. Bellomi and R. Bonato. Network Analysis for Wikipedia. Proceedings of Wikimania, 2005.
[4]
A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L. Wiener. Graph structure in the web. Computer Networks, 33(1--6):309--320, 2000.
[5]
A. Capocci, V. D. P. Servedio, F. Colaiori, L. S. Buriol, D. Donato, S. Leonardi, and G. Caldarelli. Preferential attachment in the growth of social networks: the case of Wikipedia. Physical Review E, Feb 2006.
[6]
K. Collins-Thompson and J. Callan. Query expansion using random walk models. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 704--711, New York, NY, USA, 2005. ACM.
[7]
N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance weighting for query independent evidence. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 416--423, New York, NY, USA, 2005. ACM.
[8]
N. Craswell and M. Szummer. Random walks on the click graph. In SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, July 23-27, 2007, pages 239--246, 2007.
[9]
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On Power-Law Relationships of the Internet Topology. In SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, pages 251--262. ACM Press, New York NY, USA, 1999.
[10]
A. Halavais and D. Lackaff. An Analysis of Topical Coverage of Wikipedia. Journal of Computer-Mediated Communication, 13(2):429--440, 2008.
[11]
D. Harman. Relevance feedback revisited. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 1--10, New York, NY, USA, 1992. ACM.
[12]
D. Hawking. Overview of the TREC-9 Web Track. In TREC, 2000.
[13]
D. He and Y. Peng. Comparing two blind relevance feedback techniques. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 649--650, New York, NY, USA, 2006. ACM.
[14]
J. Kamps and M. Koolen. The Importance of Link Evidence in Wikipedia. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume 4956 of Lecture Notes in Computer Science, pages 270--282. Springer Verlag, Heidelberg, 2008.
[15]
P. Kantor, G. Kazai, N. Milic-Frayling, and R. Wilkinson, editors. BooksOnline '08: Proceeding of the 2008 ACM workshop on Research advances in large digital book repositories, New York, NY, USA, 2008. ACM.
[16]
G. Kazai and A. Doucet. Overview of the INEX 2007 Book Search track. SIGIR Forum, 42(1):2--15, 2008.
[17]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[18]
W. Kraaij and T. Westerveld. How Different are Web Documents? In Proceedings of the ninth Text Retrieval Conference, TREC-9. NIST Special Publication, May 2001.
[19]
W. Kraaij, T. Westerveld, and D. Hiemstra. The importance of prior probabilities for entry page search. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 27--34, New York, NY, USA, 2002. ACM.
[20]
Y. Li, W. P. R. Luk, K. S. E. Ho, and F. L. K. Chung. Improving weak ad-hoc queries using Wikipedia as external corpus. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 797--798, New York, NY, USA, 2007. ACM.
[21]
W. Magdy and K. Darwish. Book search: indexing the valuable parts. In {15}, pages 53--56, New York, NY, USA, 2008. ACM.
[22]
F. Å. Nielsen. Scientific citations in Wikipedia. First Monday, 12(8), 2007.
[23]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[24]
A. Singhal and M. Kaszkiel. A case study in web search using TREC algorithms. In Proceedings of the 10th international conference on World Wide Web, pages 708--716, New York, NY, USA, 2001. ACM.
[25]
C. Tre. Common evaluation measures. The Twelfth Text REtrieval Conference (TREC 2003), 2003.
[26]
C. J. Van Rijsbergen. Information Retrieval, 2nd edition. Dept. of Computer Science, University of Glasgow, 1979.
[27]
J. Voss. Measuring Wikipedia. In Proceedings International Conference of the International Society for Scientometrics and Informetrics, Stockholm, Sweden, 2005.
[28]
H. Wu, G. Kazai, and M. Taylor. Book search experiments: Investigating ir methods for the indexing and retrieval of books. In Proceedings of the 30th European Conference on Information Retrieval, Glasgow, volume 4956 of Lecture Notes in Computer Science, pages 234--245. Springer Verlag, Heidelberg, 2008.
[29]
V. Zlatic, M. Bozicevic, H. Stefancic, and M. Domazet. Wikipedias: Collaborative web-based encyclopedias as complex networks. Physical Review E, Jul 2006.

Cited By

View all
  • (2020)Enhancing information retrieval performance by using social analysisSocial Network Analysis and Mining10.1007/s13278-020-00635-w10:1Online publication date: 8-Apr-2020
  • (2019)Proposal of a Novel Book Content Search Method for Web-Based EBook Libraries2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW.2019.00079(420-424)Online publication date: Nov-2019
  • (2017)Aggregated SearchFoundations and Trends in Information Retrieval10.1561/150000005210:5(365-502)Online publication date: 6-Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
February 2009
314 pages
ISBN:9781605583907
DOI:10.1145/1498759
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wikipedia
  2. domain specific
  3. link graph
  4. query expansion

Qualifiers

  • Research-article

Conference

WSDM'09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Enhancing information retrieval performance by using social analysisSocial Network Analysis and Mining10.1007/s13278-020-00635-w10:1Online publication date: 8-Apr-2020
  • (2019)Proposal of a Novel Book Content Search Method for Web-Based EBook Libraries2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW.2019.00079(420-424)Online publication date: Nov-2019
  • (2017)Aggregated SearchFoundations and Trends in Information Retrieval10.1561/150000005210:5(365-502)Online publication date: 6-Mar-2017
  • (2017)Is Wikipedia a Latent Gene Ontology?2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE)10.1109/WETICE.2017.19(164-169)Online publication date: Jun-2017
  • (2016)Utilising Semantically Rich Big Data to Enhance Book Recommendation Engines2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2016.0204(1434-1441)Online publication date: Dec-2016
  • (2015)A LOD-based, query construction and refinement service for web search enginesProceedings of the 5th International Conference on Web Intelligence, Mining and Semantics10.1145/2797115.2797122(1-8)Online publication date: 13-Jul-2015
  • (2015)Exploiting Wikipedia for Information Retrieval TasksProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767879(1137-1140)Online publication date: 9-Aug-2015
  • (2014)Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readershipJournal of the Association for Information Science and Technology10.1002/asi.2316265:12(2381-2403)Online publication date: 8-Jul-2014
  • (2013)Method of Lexical Enrichment in Information Retrieval System in ArabicInternational Journal of Information Retrieval Research10.4018/ijirr.20131001033:4(35-51)Online publication date: 1-Oct-2013
  • (2013)Mashups for Web Search EnginesSemantic Mashups10.1007/978-3-642-36403-7_3(91-117)Online publication date: 2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media