ABSTRACT
In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank search results by optimizing two metrics: (1) diversity -- which indicates the variance of topics in a group of documents; (2) information richness -- which measures the coverage of a single document to its topic. Both of the two metrics are calculated from a directed link graph named Affinity Graph (AG). AG models the structure of a group of documents based on the asymmetric content similarities between each pair of documents. Experimental results in Yahoo! Directory, ODP Data, and Newsgroup data demonstrate that our proposed ranking algorithm significantly improves the search performance. Specifically, the algorithm achieves 31% improvement in diversity and 12% improvement in information richness relatively within the top 10 search results.
- Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley Longman, 1999. Google ScholarDigital Library
- Calvo, R.A., Lee, J.-M. and Li, X. Managing Content with Automatic Document Classification. Journal of Digital Information, 5 (2).Google Scholar
- Carbonell, J. and Goldstein, J., The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, (Melbourne, Australia, 1998), 335--336. Google ScholarDigital Library
- Chen, Z., Tao, L., Wang, J., Liu, W. and Ma, W.-Y., A Unified Framework for Web Link Analysis. In Proceedings of the 3rd International Conference on Web Information Systems Engineering, (Singapore, 2002), 63--72. Google ScholarDigital Library
- Croft, W.B., Cronen-Townsend, S. and Larvrenko, V., Relevance feedback and personalization: A language modeling perspective. In Proceedings of the DELOS Network of Excellence Workshop on "Personalisation and Recommender Systems in Digital Libraries", (Dublin City University, Ireland, 2001).Google Scholar
- DirectHit. http://www.directhit.com.Google Scholar
- Dumais, S. and Chen, H., Hierarchical classification of Web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, (Athens, Greece, 2000), 256--263. Google ScholarDigital Library
- Gibson, D., Kleinberg, J.M. and Raghavan, P., Inferring Web communities from link topology. In Proceedings of the 9th ACM Conference on Hypertext and Hypermedia, (Pittsburgh, PA, 1998), 225--234. Google ScholarDigital Library
- Kleinberg, J.M. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46 (5). 604--632. Google ScholarDigital Library
- Lu, Q. and Getoor, L., Link-based Classification. In Proceedings of the International Conference on Machine Learning, (Washington DC, 2003), 496--503.Google Scholar
- ODP. http://dmoz.org/.Google Scholar
- Page, L., Brin, S., Motwani, R. and Windograd, T. The pagerank citation ranking: Bring order to the web, Stanford Digital Library Technologies Project, 1998.Google Scholar
- Porter, M.F. An algorithm for suffix stripping Program, 1980, 130--137.Google Scholar
- Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M., Okapi at TREC. In Proceedings of the Text REtrieval Conference, (1992), 21--30.Google Scholar
- Wong, S.K.M. and Raghavan, V.V., Vector space model of information retrieval: a reevaluation. In Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval, (Cambridge, England, 1984), 167--185. Google ScholarDigital Library
- Xi, W., Zhang, B., Chen, Z., Lu, Y., Yan, S., Ma, W.-Y. and Fox, E.A., Link fusion: a unified link analysis framework for multi-type interrelated data objects. In Proceedings of the 13th international conference on World Wide Web, (New York, NY, USA, 2004), 319--327. Google ScholarDigital Library
- Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., Zhang, H.-J. and Lu, C.-J., Implicit link analysis for small web search. In Proceedings of the 26th annual international ACM SIGIR conference on Research and Development in Information Retrieval, (Toronto, Canada, 2003), 56--63. Google ScholarDigital Library
- Zhai, C.X., Cohen, W.W. and Lafferty, J., Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, (Toronto, Canada, 2003), 10--17. Google ScholarDigital Library
Index Terms
- Improving web search results using affinity graph
Recommendations
Affinity rank: a new scheme for efficient web search
WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & postersMaximizing only the relevance between queries and documents will not satisfy users if they want the top search results to present a wide coverage of topics by a few representative documents. In this paper, we propose two new metrics to evaluate the ...
Using web-graph distance for relevance feedback in web search
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalWe study the effect of user supplied relevance feedback in improving web search results. Rather than using query refinement or document similarity measures to rerank results, we show that the web-graph distance between two documents is a robust measure ...
Learning to Improve Affinity Ranking for Diversity Search
Information Retrieval TechnologyAbstractSearch diversification plays an important role in modern search engine, especially when user-issued queries are ambiguous and the top ranked results are redundant. Some diversity search approaches have been proposed for reducing the information ...
Comments