Skip to main content
Log in

Optimal algorithms for selecting top-k combinations of attributes: theory and applications

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-km, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-km algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-km queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-km algorithms on multiple real-life datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Notes

  1. Lists are sorted by scores decreasingly.

  2. The score is computed by an aggregation of various scoring items provided by the NBA for the corresponding game.

  3. The top-2 games of each combination are shown in Fig. 1b.

  4. Hash indexes can be built to achieve the goal of random accesses.

  5. http://www.ncbi.nlm.nih.gov/pubmed.

  6. http://www.nlm.nih.gov/mesh/meshhome.html.

  7. http://www.nba.com/.

  8. http://developer.yahoo.com/yql/console/.

  9. http://dblp.uni-trier.de/xml.

  10. http://www.ncbi.nlm.nih.gov/pubmed.

  11. http://www.wonko.info/ipt/babel.htm.

  12. http://www.nlm.nih.gov/mesh/meshhome.html.

References

  1. Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: SODA, pp. 633–634 (2002)

  2. Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: Index-access optimized top-k query processing. In: VLDB, pp. 475–486 (2006)

  3. Bruno, N., Chaudhuri, S., Gravano, L.: Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM TODS 27(2), 153–187 (2002)

    Article  Google Scholar 

  4. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: ICDE, pp. 369–380 (2002)

  5. Chang, K.C.-C., Hwang, S.-W.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)

  6. Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: ICDE, pp. 689–700 (2010)

  7. Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE, pp. 122–133 (2013)

  8. Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226 (1996)

  9. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIDMA 17(1), 134–160 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. JCSS 66(4), 614–656 (2003)

    MathSciNet  MATH  Google Scholar 

  11. Fan, W., Wang, X., Wu, Y.: Diversified top-k graph pattern matching. PVLDB 6(13), 1510–1521 (2013)

    Google Scholar 

  12. Feng, J., Li, G., Wang, J.: Finding top-k answers in keyword search over relational databases using tuple units. TKDE 23(12), 1781–1794 (2011)

    Google Scholar 

  13. Guntzer, J., Balke, W.-T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: ITCC, pp. 622–628 (2001)

  14. Güntzer, U., Balke, W., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)

  15. He, R., Lin, C., McAuley, J.: Fashionista: A fashion-aware graphical system for exploring visually similar items. In: WWW, pp. 199–202 (2016)

  16. He, R., Lin, C., Wang, J., McAuley, J.: Sherlock: sparse hierarchical embeddings for visually-aware one-class collaborative filtering. In: IJCAI, pp. 3740–3746 (2016)

  17. Hua, M., Pei, J., Fu, A.W., Lin, X., Leung, H.: Top-k typicality queries and efficient query answering methods on large databases. VLDB J. 18(3), 809–835 (2009)

    Article  Google Scholar 

  18. Ilyas, I.F., Aref, W.G., Elmagarmid, A. K.: Joining ranked inputs in practice. In: VLDB, pp. 950–961 (2002)

  19. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. In: VLDB, pp. 754–765 (2003)

  20. Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004)

    Article  Google Scholar 

  21. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. CSUR 40(4), 11 (2008)

    Article  Google Scholar 

  22. Li, C., Chen-Chuan Chang, K., Ilyas, I.F.: Supporting ad-hoc ranking aggregates. In: SIGMOD, pp. 61–72 (2006)

  23. Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: ICDE, pp. 673–684 (2011)

  24. Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. VLDB J. 20(6), 819–840 (2011)

    Article  Google Scholar 

  25. Lu, E.H.-C., Chen, C.-Y., Tseng, V.S.: Personalized trip recommendation with multiple constraints by mining user check-in behaviors. In: SIGSPATIAL GIS, pp. 209–218 (2012)

  26. Lu, J., Senellart, P., Lin, C., Du, X., Wang, S., Chen, X.: Optimal top-k generation of attribute combinations based on ranked lists. In: SIGMOD, pp. 409–420 (2012)

  27. Mamoulis, N., Yiu, M.L., Cheng, K.H., Cheung, D.W.: Efficient top-k aggregation of ranked inputs. TODS 32(3), 19 (2007)

    Article  Google Scholar 

  28. Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive processing of top-k queries in xml. In: ICDE, pp. 162–173 (2005)

  29. Michel, S., Triantafillou, P., Weikum, G.: Klee: a framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)

  30. Natsev, A., Chang, Y.-C., Smith, J.R., Li, C.-S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. VLDB 1, 281–290 (2001)

    Google Scholar 

  31. Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)

  32. Qiao, M., Qin, L., Cheng, H., Yu, J.X., Tian, W.: Top-k nearest keyword search on large graphs. PVLDB 6(10), 901–912 (2013)

    Google Scholar 

  33. Ranu, S., Hoang, M.X., Singh, A.K.: Answering top-k representative queries on graph databases. In: SIGMOD, pp. 1163–1174 (2014)

  34. Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)

  35. Schwartz, J.T.: Fast probabilistic algorithms for verification of polynomial identities. JACM 27(4), 701–717 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  36. Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: EDBT, pp. 156–167 (2012)

  37. Soliman, M.A., Ilyas, I.F., Chang, K. C.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)

  38. Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Probabilistic top-k and ranking-aggregate queries. TODS 33(3), 13 (2008)

    Article  Google Scholar 

  39. Theobald, M., Schenkel, R., Weikum, G.: An efficient and versatile query engine for topx search. In: VLDB, pp. 625–636 (2005)

  40. Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)

  41. Varadarajan, R., Farfán, F., Hristidis, V.: Comparing top-k XML lists. Inf. Syst. 38(6), 820–834 (2013)

    Article  Google Scholar 

  42. Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: ICDE, pp. 990–1001 (2016)

  43. Yang, Z., Fu, A.W., Liu, R.: Diversified top-k subgraph querying in a large graph. In: SIGMOD, pp. 1167–1182 (2016)

  44. Yiu, M.L., Mamoulis, N., Hristidis, V.: Extracting k most important groups from data efficiently. DKE 66(2), 289–310 (2008)

    Article  Google Scholar 

  45. Zhang, X., Chomicki, J.: Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Parallel Databases 26(1), 67–126 (2009)

    Article  Google Scholar 

  46. Zhu, R., Zou, Z., Li, J.: Towards efficient top-k reliability search on uncertain graphs. KAIS 50(3), 723–750 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunbin Lin.

Additional information

This work is partially supported by NSF BIGDATA 1447943, Academy of Finland (310321), NSF China (61472427,61502503), DSAIR center in NTU and Grant MOE2015-T2-2-069 Singapore.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, C., Lu, J., Wei, Z. et al. Optimal algorithms for selecting top-k combinations of attributes: theory and applications. The VLDB Journal 27, 27–52 (2018). https://doi.org/10.1007/s00778-017-0485-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-017-0485-2

Keywords

Navigation