Abstract
In complex search tasks, it is often required to pose several basic search queries, join the answers to these queries, where each answer is given as a ranked list of items, and return a ranked list of combinations. However, the join result may include too many repetitions of items, and hence, frequently the entire join is too large to be useful. This can be solved by choosing a small subset of the join result. The focus of this paper is on how to choose this subset. We propose two measures for estimating the quality of result sets, namely, coverage and optimality ratio. Intuitively, maximizing the coverage aims at including in the result as many as possible appearances of items in their optimal combination, and maximizing the optimality ratio means striving to have each item appearing only in its optimal combination, i.e., only in the most highly ranked combination that contains it. One of the difficulties, when choosing the subset of the join in a complex search, is that there is a conflict between maximizing the coverage and maximizing the optimality ratio.
In this paper, we introduce the measures coverage and optimality ratio. We present new semantics for complex search queries, aiming at providing high coverage and high optimality ratio. We examine the quality of the results of existing and the novel semantics, according to these two measures, and we provide algorithms for answering complex search queries under the new semantics. Finally, we present an experimental study, using Yahoo! Local Search Web Services, of the efficiency and the scalability of our algorithms, showing that complex search queries can be evaluated effectively under the proposed semantics.
Similar content being viewed by others
Notes
There exist Web sites that for given parameters provide a list of hotels, or a list of restaurants, with their rank and their price, e.g., www.tripadvisor.com. Some of these Web sites provide the result as a Web service, e.g., Yahoo! Travel. This allows applications to easily apply the search over several sources and integrate the results.
Skyline and RepeatedTop1 are not influenced by h and are only presented for comparison.
The association degree is the probability that two items are joined.
The running times for higher values of p are very long and therefore omitted.
References
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)
Balke, W.T., Guntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: EDBT, pp. 256–273 (2004)
Borzsonyi, S., Stocker, K., Kossmann, D.: The skyline operator. In: Proc. of 17th International Conference on Data Engineering, pp. 421–430 (2001)
Braga, D., Campi, A., Ceri, S., Raffio, A.: Joining the results of heterogeneous search engines. Inf. Syst. 33(7–8), 658–680 (2008)
Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. Proceedings of the VLDB Endowment 1(1), 562–573 (2008)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)
Ceri, S.M.B.: Search Computing: Challenges and Directions. Springer, Berlin (2010)
Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 503–514. ACM, New York (2006)
Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: On high dimensional skylines. In: Advances in Database Technology—EDBT 2006, pp. 478–495 (2006)
Chen, H., Karger, D.R.: Less is more: probabilistic models for retrieving fewer relevant documents. In: SIGIR, pp. 429–436 (2006)
Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: SIGIR, pp. 659–666 (2008)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Finger, J., Polyzotis, N.: Robust and efficient algorithms for rank join evaluation. In: SIGMOD, pp. 415–428. ACM, New York (2009)
Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009)
Ilyas, F., Aref, G., Elmagarmid, K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Jin, W., Ester, M., Hu, Z., Han, J.: The multi-relational skyline operator. In: ICDE, pp. 1276–1280 (2007)
Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: Knowledge Discovery in Databases: PKDD, pp. 255–266 (2004)
Jones, S., Walker, S., Robertson, S.: A probabilistic model of information retrieval: development and comparative experiments (parts 1 and 2). Inf. Process. Manag. 36(6), 779–840 (2000)
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 86–95. IEEE Press, New York (2007)
Natsev, A., Chang, Y.C., Smith, J.R., Li, C.S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB, pp. 281–290. Morgan Kaufmann, San Mateo (2001)
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: TREC-3, Gaithersburg, USA, pp. 109–126 (1994)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Schnaitter, K., Polyzotis, N.: Evaluating rank joins with optimal cost. In: PODS, pp. 43–52 (2008)
Shalem, M., Kanza, Y.: Computing the top-k maximal answers in a join of ranked lists. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 1381–1384. ACM, New York (2010)
Shalem, M., Kanza, Y.: How to choose combinations in a join of search results. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW ’11, pp. 119–120. ACM, New York (2011)
Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Yahia, S.A.: Efficient computation of diverse query results. In: ICDE, pp. 228–236 (2008)
Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 1397–1399. IEEE Press, New York (2008)
Yiu, M., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 483–494 (2007)
Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manag. 42(1), 31–55 (2006)
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: SIGIR, pp. 504–511 (2005)
Acknowledgements
This work was partially supported by The Israeli Ministry of Science and Technology (Grant 3/6472) and by the German–Israeli Foundation for Scientific Research and Development (Grant 2165/07).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Kaushik Chakrabarti.
Rights and permissions
About this article
Cite this article
Shalem, M., Kanza, Y. On optimality-ratio and coverage in ranking of joined search results. Distrib Parallel Databases 30, 209–237 (2012). https://doi.org/10.1007/s10619-012-7095-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-012-7095-1