Abstract
In web search, given a query, a search engine is required to retrieve a set of relevant documents. We wish to rank documents based on the content and look beyond mere relevance. Often there is a requirement that users want comprehensive documents containing variety of aspects of information relevant to the query topic. Given a query, a document is considered to be comprehensive only if the document covers more number of aspects of the given query. The comprehensiveness of a web document may be estimated by analyzing various parts of its content, and checking diversity, coverage of the content and the relevance as well. In this work, we have proposed an information retrieval system that ranks documents based on the comprehensiveness of the content. We use pseudo relevance feedback to score the comprehensiveness of web documents as well as their relevance. Experiments show that the proposed method effectively identifies documents having comprehensive content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lipshutz, M., Taylor, S.: Comprehensive document representation. Mathematical and Computer Modelling 25(4), 85–93 (1997)
Lee, K.S., Park, Y.C., Choi, K.S.: Re-ranking model based on document clusters. Inf. Process. Manage. 37(1), 1–14 (2001)
Salton, G., Wong, A., Yang, A.C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 229–237 (1975)
Makrehchi, M.: Query-relevant document representation for text clustering. In: ICDIM, pp. 132–138 (2010)
Krikon, E., Kurland, O., Bendersky, M.: Utilizing inter-passage and inter-document similarities for reranking search results. ACM Trans. Inf. Syst. 29(1), 1–3 (2010)
Xu, Y., Yin, H.: Novelty and topicality in interactive information retrieval. J. Am. Soc. Inf. Sci. Technol. 59(2), 201–215 (2008)
Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 659–666. ACM, New York (2008)
Nuray, R., Can, F.: Automatic ranking of information retrieval systems using data fusion. Inf. Process. Manage. 42(3), 595–614 (2006)
wU, H., Luk, R., Wong, K., Nie, J.: A split-list approach for relevance feedback in information retrieval. Information Processing & Management (2012) (to appear)
Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., Wu, S.L.: Intent-based diversification of web search results: metrics and algorithms. Inf. Retr. 14(6), 572–592 (2011)
Rafiei, D., Bharat, K., Shukla, A.: Diversifying web search results. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 781–790. ACM, New York (2010)
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM 2009, pp. 5–14. ACM, New York (2009)
Welch, M.J., Cho, J., Olston, C.: Search result diversity for informational queries. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 237–246. ACM, New York (2011)
Santos, R.L., Macdonald, C., Ounis, I.: How diverse are web search results? In: Proc. of the 34th Int. ACM SIGIR Conference, SIGIR 2011, pp. 1187–1188. ACM, New York (2011)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Gupta, S., Kaiser, G.E., Grimm, P., Chiang, M.F., Starren, J.: Automating content extraction of html documents. World Wide Web 8(2), 179–224 (2005)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)
Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 29(11), 1944–1957 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prasath, R., Sarkar, S. (2013). A Pseudo-Relevance Feedback Based Method to Find Comprehensive Web Documents. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-45114-0_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)