A Pseudo-Relevance Feedback Based Method to Find Comprehensive Web Documents

Prasath, Rajendra; Sarkar, Sudeshna

doi:10.1007/978-3-642-45114-0_29

Rajendra Prasath²² &
Sudeshna Sarkar²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8265))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1332 Accesses

Abstract

In web search, given a query, a search engine is required to retrieve a set of relevant documents. We wish to rank documents based on the content and look beyond mere relevance. Often there is a requirement that users want comprehensive documents containing variety of aspects of information relevant to the query topic. Given a query, a document is considered to be comprehensive only if the document covers more number of aspects of the given query. The comprehensiveness of a web document may be estimated by analyzing various parts of its content, and checking diversity, coverage of the content and the relevance as well. In this work, we have proposed an information retrieval system that ranks documents based on the comprehensiveness of the content. We use pseudo relevance feedback to score the comprehensiveness of web documents as well as their relevance. Experiments show that the proposed method effectively identifies documents having comprehensive content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lipshutz, M., Taylor, S.: Comprehensive document representation. Mathematical and Computer Modelling 25(4), 85–93 (1997)
Article MathSciNet Google Scholar
Lee, K.S., Park, Y.C., Choi, K.S.: Re-ranking model based on document clusters. Inf. Process. Manage. 37(1), 1–14 (2001)
Article MATH Google Scholar
Salton, G., Wong, A., Yang, A.C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 229–237 (1975)
Article Google Scholar
Makrehchi, M.: Query-relevant document representation for text clustering. In: ICDIM, pp. 132–138 (2010)
Google Scholar
Krikon, E., Kurland, O., Bendersky, M.: Utilizing inter-passage and inter-document similarities for reranking search results. ACM Trans. Inf. Syst. 29(1), 1–3 (2010)
Article Google Scholar
Xu, Y., Yin, H.: Novelty and topicality in interactive information retrieval. J. Am. Soc. Inf. Sci. Technol. 59(2), 201–215 (2008)
Article Google Scholar
Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 659–666. ACM, New York (2008)
Chapter Google Scholar
Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Inf. Process. Manage. 42(3), 595–614 (2006)
Article MATH Google Scholar
wU, H., Luk, R., Wong, K., Nie, J.: A split-list approach for relevance feedback in information retrieval. Information Processing & Management (2012) (to appear)
Google Scholar
Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., Wu, S.L.: Intent-based diversification of web search results: metrics and algorithms. Inf. Retr. 14(6), 572–592 (2011)
Article Google Scholar
Rafiei, D., Bharat, K., Shukla, A.: Diversifying web search results. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 781–790. ACM, New York (2010)
Google Scholar
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM 2009, pp. 5–14. ACM, New York (2009)
Chapter Google Scholar
Welch, M.J., Cho, J., Olston, C.: Search result diversity for informational queries. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 237–246. ACM, New York (2011)
Google Scholar
Santos, R.L., Macdonald, C., Ounis, I.: How diverse are web search results? In: Proc. of the 34th Int. ACM SIGIR Conference, SIGIR 2011, pp. 1187–1188. ACM, New York (2011)
Google Scholar
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Gupta, S., Kaiser, G.E., Grimm, P., Chiang, M.F., Starren, J.: Automating content extraction of html documents. World Wide Web 8(2), 179–224 (2005)
Article Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Article Google Scholar
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721 302, India
Rajendra Prasath & Sudeshna Sarkar

Authors

Rajendra Prasath
View author publications
You can also search for this author in PubMed Google Scholar
Sudeshna Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad Autónoma del Estado de Hidalgo, Ciudad Universitaria,, Carretera Pachuca–Tulancingo km 4.5, Hidalgo, Mexico
Félix Castro
Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan Dios Bátiz s/n, Col. Nueva Industrial Vallejo, 07738, Mexico City, Mexico
Alexander Gelbukh
Tecnológico de Monterrey, Campus Estado de México,, Carretera Lago de Guadalupe Km 3.5, Atizapán de Zaragoza,, CP 52926, Estado de México, Mexico
Miguel González

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Prasath, R., Sarkar, S. (2013). A Pseudo-Relevance Feedback Based Method to Find Comprehensive Web Documents. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-45114-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics