On optimality-ratio and coverage in ranking of joined search results

Shalem, Mirit; Kanza, Yaron

doi:10.1007/s10619-012-7095-1

On optimality-ratio and coverage in ranking of joined search results

Published: 13 June 2012

Volume 30, pages 209–237, (2012)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Mirit Shalem¹ &
Yaron Kanza¹

130 Accesses
2 Citations
Explore all metrics

Abstract

In complex search tasks, it is often required to pose several basic search queries, join the answers to these queries, where each answer is given as a ranked list of items, and return a ranked list of combinations. However, the join result may include too many repetitions of items, and hence, frequently the entire join is too large to be useful. This can be solved by choosing a small subset of the join result. The focus of this paper is on how to choose this subset. We propose two measures for estimating the quality of result sets, namely, coverage and optimality ratio. Intuitively, maximizing the coverage aims at including in the result as many as possible appearances of items in their optimal combination, and maximizing the optimality ratio means striving to have each item appearing only in its optimal combination, i.e., only in the most highly ranked combination that contains it. One of the difficulties, when choosing the subset of the join in a complex search, is that there is a conflict between maximizing the coverage and maximizing the optimality ratio.

In this paper, we introduce the measures coverage and optimality ratio. We present new semantics for complex search queries, aiming at providing high coverage and high optimality ratio. We examine the quality of the results of existing and the novel semantics, according to these two measures, and we provide algorithms for answering complex search queries under the new semantics. Finally, we present an experimental study, using Yahoo! Local Search Web Services, of the efficiency and the scalability of our algorithms, showing that complex search queries can be evaluated effectively under the proposed semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Query Optimization Strategies in Similarity-Based Databases

Adaptive query relaxation and top-k result ranking over autonomous web databases

Article 16 August 2016

Xiangfu Meng, Xiaoyan Zhang, … Chongchun Bi

Notes

There exist Web sites that for given parameters provide a list of hotels, or a list of restaurants, with their rank and their price, e.g., www.tripadvisor.com. Some of these Web sites provide the result as a Web service, e.g., Yahoo! Travel. This allows applications to easily apply the search over several sources and integrate the results.
http://local.yahoo.com/.
Skyline and RepeatedTop1 are not influenced by h and are only presented for comparison.
The association degree is the probability that two items are joined.
The running times for higher values of p are very long and therefore omitted.

References

Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009)
Chapter Google Scholar
Balke, W.T., Guntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: EDBT, pp. 256–273 (2004)
Google Scholar
Borzsonyi, S., Stocker, K., Kossmann, D.: The skyline operator. In: Proc. of 17th International Conference on Data Engineering, pp. 421–430 (2001)
Chapter Google Scholar
Braga, D., Campi, A., Ceri, S., Raffio, A.: Joining the results of heterogeneous search engines. Inf. Syst. 33(7–8), 658–680 (2008)
Article Google Scholar
Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. Proceedings of the VLDB Endowment 1(1), 562–573 (2008)
Google Scholar
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998)
Google Scholar
Ceri, S.M.B.: Search Computing: Challenges and Directions. Springer, Berlin (2010)
Google Scholar
Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 503–514. ACM, New York (2006)
Chapter Google Scholar
Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: On high dimensional skylines. In: Advances in Database Technology—EDBT 2006, pp. 478–495 (2006)
Chapter Google Scholar
Chen, H., Karger, D.R.: Less is more: probabilistic models for retrieving fewer relevant documents. In: SIGIR, pp. 429–436 (2006)
Google Scholar
Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: SIGIR, pp. 659–666 (2008)
Chapter Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)
Article MathSciNet MATH Google Scholar
Finger, J., Polyzotis, N.: Robust and efficient algorithms for rank join evaluation. In: SIGMOD, pp. 415–428. ACM, New York (2009)
Chapter Google Scholar
Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009)
Chapter Google Scholar
Ilyas, F., Aref, G., Elmagarmid, K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)
Article Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008)
Article Google Scholar
Jin, W., Ester, M., Hu, Z., Han, J.: The multi-relational skyline operator. In: ICDE, pp. 1276–1280 (2007)
Google Scholar
Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: Knowledge Discovery in Databases: PKDD, pp. 255–266 (2004)
Chapter Google Scholar
Jones, S., Walker, S., Robertson, S.: A probabilistic model of information retrieval: development and comparative experiments (parts 1 and 2). Inf. Process. Manag. 36(6), 779–840 (2000)
Article Google Scholar
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 86–95. IEEE Press, New York (2007)
Chapter Google Scholar
Natsev, A., Chang, Y.C., Smith, J.R., Li, C.S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB, pp. 281–290. Morgan Kaufmann, San Mateo (2001)
Google Scholar
Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: TREC-3, Gaithersburg, USA, pp. 109–126 (1994)
Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Schnaitter, K., Polyzotis, N.: Evaluating rank joins with optimal cost. In: PODS, pp. 43–52 (2008)
Chapter Google Scholar
Shalem, M., Kanza, Y.: Computing the top-k maximal answers in a join of ranked lists. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 1381–1384. ACM, New York (2010)
Chapter Google Scholar
Shalem, M., Kanza, Y.: How to choose combinations in a join of search results. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW ’11, pp. 119–120. ACM, New York (2011)
Chapter Google Scholar
Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Yahia, S.A.: Efficient computation of diverse query results. In: ICDE, pp. 228–236 (2008)
Google Scholar
Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 1397–1399. IEEE Press, New York (2008)
Chapter Google Scholar
Yiu, M., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 483–494 (2007)
Google Scholar
Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manag. 42(1), 31–55 (2006)
Article MATH Google Scholar
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: SIGIR, pp. 504–511 (2005)
Google Scholar

Download references

Acknowledgements

This work was partially supported by The Israeli Ministry of Science and Technology (Grant 3/6472) and by the German–Israeli Foundation for Scientific Research and Development (Grant 2165/07).

Author information

Authors and Affiliations

Department of Computer Science, Technion, Haifa, Israel
Mirit Shalem & Yaron Kanza

Authors

Mirit Shalem
View author publications
You can also search for this author in PubMed Google Scholar
Yaron Kanza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirit Shalem.

Additional information

Communicated by: Kaushik Chakrabarti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shalem, M., Kanza, Y. On optimality-ratio and coverage in ranking of joined search results. Distrib Parallel Databases 30, 209–237 (2012). https://doi.org/10.1007/s10619-012-7095-1

Download citation

Published: 13 June 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10619-012-7095-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On optimality-ratio and coverage in ranking of joined search results

Abstract

Access this article

Similar content being viewed by others

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Query Optimization Strategies in Similarity-Based Databases

Adaptive query relaxation and top-k result ranking over autonomous web databases

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On optimality-ratio and coverage in ranking of joined search results

Abstract

Access this article

Similar content being viewed by others

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Query Optimization Strategies in Similarity-Based Databases

Adaptive query relaxation and top-k result ranking over autonomous web databases

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation