Skip to main content
Log in

Aggregate keyword search on large relational databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem. We further extend our methods to allow partial matches and matches using a keyword ontology. An extensive empirical evaluation using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and the efficiency of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal S, Chaudhuri S, Das G (2002) DBXplorer: a system for keyword-based search over relational databases. In: Proceedings of the 18th international conference on data engineering (ICDE’02). IEEE Computer Society Washington, DC, USA, pp 5–16

  2. Amer-Yahia S, Case P, Rölleke T, Shanmugasundaram J, Weikum G (2005) Report on the DB/IR panel at sigmod 2005. ACM, New York, NY, USA, vol 34, pp 71–74

  3. Balmin A, Hristidis V, Papakonstantinou Y (2004) Objectrank: authority-based keyword search in databases. In: Proceedings of the thirtieth international conference on very large data bases (VLDB’04). VLDB Endowment, pp 564–575

  4. Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cube. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’99). ACM, New York, NY, USA, pp 359–370

  5. Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using banks. In: Proceedings of the 18th international conference on data engineering (ICDE’02). IEEE Computer Society, pp 431–440

  6. Chaudhuri S, Das G (2009) Keyword querying and ranking in databases. PVLDB 2(2): 1658–1659

    MathSciNet  Google Scholar 

  7. Chaudhuri S, Ramakrishnan R, Weikum G (2005) Integrating DB and IR technologies: What is the sound of one hand clapping? In: Proceedings of the 2nd biennial conference on innovative data systems research (CIDR’05), pp 1–12

  8. Chen Y, Wang W, Liu Z, Lin X (2009) Keyword search on structured and semi-structured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’09). ACM, pp 1005–1010

  9. Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. McGraw-Hill Higher Education, Cambridge

    MATH  Google Scholar 

  10. Daoud M, Lechani LT, Boughanem M (2009) Towards a graph-based user profile modeling for a session-based personalized search. Knowl Inf Syst 21(3): 365–398

    Article  Google Scholar 

  11. Ding B, Yu JX, Wang S, Qin L, Zhang X, Lin X (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of the 23rd IEEE international conference on data engineering (ICDE’07). IEEE Computer Society, Washington, DC, USA, pp 836–845

  12. Dreyfus SE, Wagner RA (1972) The steiner problem in graphs. Networks 1: 195–207

    Article  MATH  MathSciNet  Google Scholar 

  13. Fang M, Shivakumar N, Garcia-Molina H, Motwani R, Ullman JD (1998) Computing iceberg queries efficiently. In: Proceedings of the international conference on very large data bases (VLDB’98). New York, NY, pp 299–310

  14. Fellbaum, C (eds) (1998) WordNet: an electronic lexical database. MIT Press, Cambridge

    MATH  Google Scholar 

  15. Feng Y, Agrawal D, Abbadi AE, Metwally A (2004) Range Cube: efficient cube computation by exploiting data correlation. In: Proceedings of the international conference on data engineering (ICDE’04). Boston, MA, pp 658–669

  16. Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co, New York

    MATH  Google Scholar 

  17. Gong Z, Liu Q (2009) Improving keyword based web image search with visual feature distribution and term expansion. Knowl Inf Syst 21(1): 113–132

    Article  Google Scholar 

  18. Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational operator generalizing group-by, cross-tab and sub-totals. In: Proceedings of the international conference on data engineering (ICDE’96). New Orleans, Louisiana, pp 152–159

  19. Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. In: Proceedings of ACM-SIGMOD international conference on management of data (SIGMOD’01). Santa Barbara, CA, pp 1–12

  20. Harman D, Baeza-Yates R, Fox E, Lee W (1992) Inverted files. In: Information retrieval: data structures and algorithms. Prentice-Hall Inc., Upper Saddle River, NJ, USA, pp 28–43

  21. He H, Wang H, Yang J, Yu PS (2007) Blinks: ranked keyword searches on graphs. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 305–316

  22. Henzinger M, Motwani R, Silverstein C (2003) Challenges in web search engines. In: Proceedings of the 18th international joint conference on artificial intelligence (IJCAI’03), pp 1573–1579

  23. Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases (VLDB’02). Morgan Kaufmann, pp 670–681

  24. Hristidis V, Gravano L, Papakonstantinou Y (2003) Efficient IR-style keyword search over relational databases. In: Proceedings of the 29th international conference on very large data bases (VLDB’03), pp 850–861

  25. Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H (2005) Bidirectional expansion for keyword search on graph databases. In: Proceedings of the 31st international conference on very large data bases (VLDB’05). ACM, pp 505–516

  26. Kimelfeld B, Sagiv Y (2006) Finding and approximating top-k answers in keyword proximity search. In: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS’06). ACM, New York, NY, USA, pp 173–182

  27. Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the ACM SIGMOD international conference on Management of data (SIGMOD’08). ACM, New York, NY, USA, pp 903–914

  28. Liu F, Yu C, Meng W, Chowdhury A (2006) Effective keyword search in relational databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’06). ACM, New York, NY, USA, pp 563–574

  29. Liu Z, Chen Y (2007) Identifying meaningful return information for xml keyword search. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 329–340

  30. Luo Y, Lin X, Wang W, Zhou X (2007) Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 115–126

  31. Ng RT, Wagner AS, Yin Y (2001) Iceberg-cube computation with PC clusters. In: Proceedings of the ACM-SIGMOD international conference management of data (SIGMOD’01). Santa Barbara, CA

  32. Park J, goo Lee S (2010) Keyword search in relational databases. Knowl Inf Syst (Online First). doi:10.1007/s10115-010-0284-1

  33. Qin L, Yu JX, Chang L (2009a) Keyword search in databases: the power of rdbms. In: Proceedings of the 35th SIGMOD international conference on management of data (SIGMOD’09). ACM Press, Providence, Rhode Island, USA, pp 681–694

  34. Qin L, Yu JX, Chang L, Tao Y (2009b) Querying communities in relational databases. In: Proceedings of the 25th international conference on data engineering (ICDE’09). IEEE, pp 724–735

  35. Taha K, Elmasri R (2010) Bussengine: a business search engine. Knowl Inf Syst 23(2): 153–197

    Article  Google Scholar 

  36. Tong H, Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3): 327–346

    Article  MATH  Google Scholar 

  37. Vu QH, Ooi BC, Papadias D, Tung AKH (2008) A graph method for keyword-based selection of the top-k databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’08). ACM, New York, NY, USA

  38. Weikum G (2007) DB&IR: both sides now. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 25–30

  39. Wu P, Sismanis Y, Reinwald B (2007) Towards keyword-driven analytical processing. In: Proceedings of the ACM SIGMOD international conference on Management of data (SIGMOD’07). ACM, New York, NY, USA, pp 617–628

  40. Xin D, Han J, Li X, Wah BW (2003) Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of the international conference on very large data bases (VLDB’02). Berlin, Germany, pp 476–487

  41. Yu B, Li G, Sollins K, Tung AKH (2007) Effective keyword-based selection of relational databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 139–150

  42. Zhou B, Pei J (2009) Answering aggregate keyword queries on relational databases using minimal group-bys. In: Proceedings of the 12th international conference on extending database technology (EDBT’09). Saint-Petersburg, Russia

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Zhou.

Additional information

A preliminary version of this paper appears as [42]. The authors are grateful to the anonymous reviewers and the associate editor for their constructive comments that help to improve the quality of the paper. This research is supported in part by an NSERC Discovery Grant, an NSERC Discovery Accelerator Supplement Grant, and a British Columbia Natural Resources and Applied Sciences Endowment Fund. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, B., Pei, J. Aggregate keyword search on large relational databases. Knowl Inf Syst 30, 283–318 (2012). https://doi.org/10.1007/s10115-011-0379-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0379-3

Keywords

Navigation