Skip to main content
Log in

On optimality-ratio and coverage in ranking of joined search results

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In complex search tasks, it is often required to pose several basic search queries, join the answers to these queries, where each answer is given as a ranked list of items, and return a ranked list of combinations. However, the join result may include too many repetitions of items, and hence, frequently the entire join is too large to be useful. This can be solved by choosing a small subset of the join result. The focus of this paper is on how to choose this subset. We propose two measures for estimating the quality of result sets, namely, coverage and optimality ratio. Intuitively, maximizing the coverage aims at including in the result as many as possible appearances of items in their optimal combination, and maximizing the optimality ratio means striving to have each item appearing only in its optimal combination, i.e., only in the most highly ranked combination that contains it. One of the difficulties, when choosing the subset of the join in a complex search, is that there is a conflict between maximizing the coverage and maximizing the optimality ratio.

In this paper, we introduce the measures coverage and optimality ratio. We present new semantics for complex search queries, aiming at providing high coverage and high optimality ratio. We examine the quality of the results of existing and the novel semantics, according to these two measures, and we provide algorithms for answering complex search queries under the new semantics. Finally, we present an experimental study, using Yahoo! Local Search Web Services, of the efficiency and the scalability of our algorithms, showing that complex search queries can be evaluated effectively under the proposed semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. There exist Web sites that for given parameters provide a list of hotels, or a list of restaurants, with their rank and their price, e.g., www.tripadvisor.com. Some of these Web sites provide the result as a Web service, e.g., Yahoo! Travel. This allows applications to easily apply the search over several sources and integrate the results.

  2. http://local.yahoo.com/.

  3. Skyline and RepeatedTop1 are not influenced by h and are only presented for comparison.

  4. The association degree is the probability that two items are joined.

  5. The running times for higher values of p are very long and therefore omitted.

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)

    Chapter  Google Scholar 

  2. Balke, W.T., Guntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: EDBT, pp. 256–273 (2004)

    Google Scholar 

  3. Borzsonyi, S., Stocker, K., Kossmann, D.: The skyline operator. In: Proc. of 17th International Conference on Data Engineering, pp. 421–430 (2001)

    Chapter  Google Scholar 

  4. Braga, D., Campi, A., Ceri, S., Raffio, A.: Joining the results of heterogeneous search engines. Inf. Syst. 33(7–8), 658–680 (2008)

    Article  Google Scholar 

  5. Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. Proceedings of the VLDB Endowment 1(1), 562–573 (2008)

    Google Scholar 

  6. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)

    Google Scholar 

  7. Ceri, S.M.B.: Search Computing: Challenges and Directions. Springer, Berlin (2010)

    Google Scholar 

  8. Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 503–514. ACM, New York (2006)

    Chapter  Google Scholar 

  9. Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: On high dimensional skylines. In: Advances in Database Technology—EDBT 2006, pp. 478–495 (2006)

    Chapter  Google Scholar 

  10. Chen, H., Karger, D.R.: Less is more: probabilistic models for retrieving fewer relevant documents. In: SIGIR, pp. 429–436 (2006)

    Google Scholar 

  11. Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: SIGIR, pp. 659–666 (2008)

    Chapter  Google Scholar 

  12. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  13. Finger, J., Polyzotis, N.: Robust and efficient algorithms for rank join evaluation. In: SIGMOD, pp. 415–428. ACM, New York (2009)

    Chapter  Google Scholar 

  14. Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009)

    Chapter  Google Scholar 

  15. Ilyas, F., Aref, G., Elmagarmid, K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)

    Article  Google Scholar 

  16. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)

    Article  Google Scholar 

  17. Jin, W., Ester, M., Hu, Z., Han, J.: The multi-relational skyline operator. In: ICDE, pp. 1276–1280 (2007)

    Google Scholar 

  18. Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: Knowledge Discovery in Databases: PKDD, pp. 255–266 (2004)

    Chapter  Google Scholar 

  19. Jones, S., Walker, S., Robertson, S.: A probabilistic model of information retrieval: development and comparative experiments (parts 1 and 2). Inf. Process. Manag. 36(6), 779–840 (2000)

    Article  Google Scholar 

  20. Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 86–95. IEEE Press, New York (2007)

    Chapter  Google Scholar 

  21. Natsev, A., Chang, Y.C., Smith, J.R., Li, C.S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB, pp. 281–290. Morgan Kaufmann, San Mateo (2001)

    Google Scholar 

  22. Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: TREC-3, Gaithersburg, USA, pp. 109–126 (1994)

    Google Scholar 

  23. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  24. Schnaitter, K., Polyzotis, N.: Evaluating rank joins with optimal cost. In: PODS, pp. 43–52 (2008)

    Chapter  Google Scholar 

  25. Shalem, M., Kanza, Y.: Computing the top-k maximal answers in a join of ranked lists. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 1381–1384. ACM, New York (2010)

    Chapter  Google Scholar 

  26. Shalem, M., Kanza, Y.: How to choose combinations in a join of search results. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW ’11, pp. 119–120. ACM, New York (2011)

    Chapter  Google Scholar 

  27. Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Yahia, S.A.: Efficient computation of diverse query results. In: ICDE, pp. 228–236 (2008)

    Google Scholar 

  28. Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 1397–1399. IEEE Press, New York (2008)

    Chapter  Google Scholar 

  29. Yiu, M., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 483–494 (2007)

    Google Scholar 

  30. Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manag. 42(1), 31–55 (2006)

    Article  MATH  Google Scholar 

  31. Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: SIGIR, pp. 504–511 (2005)

    Google Scholar 

Download references

Acknowledgements

This work was partially supported by The Israeli Ministry of Science and Technology (Grant 3/6472) and by the German–Israeli Foundation for Scientific Research and Development (Grant 2165/07).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mirit Shalem.

Additional information

Communicated by: Kaushik Chakrabarti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shalem, M., Kanza, Y. On optimality-ratio and coverage in ranking of joined search results. Distrib Parallel Databases 30, 209–237 (2012). https://doi.org/10.1007/s10619-012-7095-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-012-7095-1

Keywords

Navigation