Optimal algorithms for selecting top-k combinations of attributes: theory and applications

Lin, Chunbin; Lu, Jiaheng; Wei, Zhewei; Wang, Jianguo; Xiao, Xiaokui

doi:10.1007/s00778-017-0485-2

Optimal algorithms for selecting top-k combinations of attributes: theory and applications

Regular Paper
Published: 26 October 2017

Volume 27, pages 27–52, (2018)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Chunbin Lin ORCID: orcid.org/0000-0002-7068-9929¹,
Jiaheng Lu²,
Zhewei Wei³,
Jianguo Wang¹ &
…
Xiaokui Xiao⁴

796 Accesses
6 Citations
Explore all metrics

Abstract

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

Exact and Approximate Generic Multi-criteria Top-k Query Processing

Top-k Queries Over Uncertain Scores

Notes

Lists are sorted by scores decreasingly.
The score is computed by an aggregation of various scoring items provided by the NBA for the corresponding game.
The top-2 games of each combination are shown in Fig. 1b.
Hash indexes can be built to achieve the goal of random accesses.
http://www.ncbi.nlm.nih.gov/pubmed.
http://www.nlm.nih.gov/mesh/meshhome.html.
http://www.nba.com/.
http://developer.yahoo.com/yql/console/.
http://dblp.uni-trier.de/xml.
http://www.ncbi.nlm.nih.gov/pubmed.
http://www.wonko.info/ipt/babel.htm.
http://www.nlm.nih.gov/mesh/meshhome.html.

References

Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: SODA, pp. 633–634 (2002)
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing. In: VLDB, pp. 475–486 (2006)
Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM TODS 27(2), 153–187 (2002)
Article Google Scholar
Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: ICDE, pp. 369–380 (2002)
Chang, K.C.-C., Hwang, S.-W.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)
Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE, pp. 689–700 (2010)
Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013)
Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226 (1996)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIDMA 17(1), 134–160 (2003)
Article MathSciNet MATH Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. JCSS 66(4), 614–656 (2003)
MathSciNet MATH Google Scholar
Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)
Google Scholar
Feng, J., Li, G., Wang, J.: Finding top-k answers in keyword search over relational databases using tuple units. TKDE 23(12), 1781–1794 (2011)
Google Scholar
Guntzer, J., Balke, W.-T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: ITCC, pp. 622–628 (2001)
Güntzer, U., Balke, W., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)
He, R., Lin, C., McAuley, J.: Fashionista: A fashion-aware graphical system for exploring visually similar items. In: WWW, pp. 199–202 (2016)
He, R., Lin, C., Wang, J., McAuley, J.: Sherlock: sparse hierarchical embeddings for visually-aware one-class collaborative filtering. In: IJCAI, pp. 3740–3746 (2016)
Hua, M., Pei, J., Fu, A.W., Lin, X., Leung, H.: Top-k typicality queries and efficient query answering methods on large databases. VLDB J. 18(3), 809–835 (2009)
Article Google Scholar
Ilyas, I.F., Aref, W.G., Elmagarmid, A. K.: Joining ranked inputs in practice. In: VLDB, pp. 950–961 (2002)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB, pp. 754–765 (2003)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)
Article Google Scholar
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. CSUR 40(4), 11 (2008)
Article Google Scholar
Li, C., Chen-Chuan Chang, K., Ilyas, I.F.: Supporting ad-hoc ranking aggregates. In: SIGMOD, pp. 61–72 (2006)
Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: ICDE, pp. 673–684 (2011)
Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. VLDB J. 20(6), 819–840 (2011)
Article Google Scholar
Lu, E.H.-C., Chen, C.-Y., Tseng, V.S.: Personalized trip recommendation with multiple constraints by mining user check-in behaviors. In: SIGSPATIAL GIS, pp. 209–218 (2012)
Lu, J., Senellart, P., Lin, C., Du, X., Wang, S., Chen, X.: Optimal top-k generation of attribute combinations based on ranked lists. In: SIGMOD, pp. 409–420 (2012)
Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. TODS 32(3), 19 (2007)
Article Google Scholar
Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)
Michel, S., Triantafillou, P., Weikum, G.: Klee: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)
Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. VLDB 1, 281–290 (2001)
Google Scholar
Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)
Qiao, M., Qin, L., Cheng, H., Yu, J.X., Tian, W.: Top-k nearest keyword search on large graphs. PVLDB 6(10), 901–912 (2013)
Google Scholar
Ranu, S., Hoang, M.X., Singh, A.K.: Answering top-k representative queries on graph databases. In: SIGMOD, pp. 1163–1174 (2014)
Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)
Schwartz, J.T.: Fast probabilistic algorithms for verification of polynomial identities. JACM 27(4), 701–717 (1980)
Article MathSciNet MATH Google Scholar
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: EDBT, pp. 156–167 (2012)
Soliman, M.A., Ilyas, I.F., Chang, K. C.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)
Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Probabilistic top-k and ranking-aggregate queries. TODS 33(3), 13 (2008)
Article Google Scholar
Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for topx search. In: VLDB, pp. 625–636 (2005)
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)
Varadarajan, R., Farfán, F., Hristidis, V.: Comparing top-k XML lists. Inf. Syst. 38(6), 820–834 (2013)
Article Google Scholar
Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: ICDE, pp. 990–1001 (2016)
Yang, Z., Fu, A.W., Liu, R.: Diversified top-k subgraph querying in a large graph. In: SIGMOD, pp. 1167–1182 (2016)
Yiu, M.L., Mamoulis, N., Hristidis, V.: Extracting k most important groups from data efficiently. DKE 66(2), 289–310 (2008)
Article Google Scholar
Zhang, X., Chomicki, J.: Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Parallel Databases 26(1), 67–126 (2009)
Article Google Scholar
Zhu, R., Zou, Z., Li, J.: Towards efficient top-k reliability search on uncertain graphs. KAIS 50(3), 723–750 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, San Diego, USA
Chunbin Lin & Jianguo Wang
Department of Computer Science, University of Helsinki, Helsinki, Finland
Jiaheng Lu
School of Information, Renmin University of China, Beijing, China
Zhewei Wei
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Xiaokui Xiao

Authors

Chunbin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jiaheng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhewei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaokui Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunbin Lin.

Additional information

This work is partially supported by NSF BIGDATA 1447943, Academy of Finland (310321), NSF China (61472427,61502503), DSAIR center in NTU and Grant MOE2015-T2-2-069 Singapore.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, C., Lu, J., Wei, Z. et al. Optimal algorithms for selecting top-k combinations of attributes: theory and applications. The VLDB Journal 27, 27–52 (2018). https://doi.org/10.1007/s00778-017-0485-2

Download citation

Received: 21 February 2017
Revised: 03 September 2017
Accepted: 30 September 2017
Published: 26 October 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s00778-017-0485-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal algorithms for selecting top-k combinations of attributes: theory and applications

Abstract

Access this article

Similar content being viewed by others

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

Exact and Approximate Generic Multi-criteria Top-k Query Processing

Top-k Queries Over Uncertain Scores

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Top-k List Aggregation: Mathematical Formulations and Polyhedral Comparisons

Exact and Approximate Generic Multi-criteria Top-k Query Processing

Top-k Queries Over Uncertain Scores

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation