Abstract
Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual tuples matching a set of query keywords from one table or the join of multiple tables. In this paper, we motivate a novel problem of aggregate keyword search: finding minimal group-bys covering a set of query keywords well, which is useful in many applications. We develop two interesting approaches to tackle the problem. We further extend our methods to allow partial matches and matches using a keyword ontology. An extensive empirical evaluation using both real data sets and synthetic data sets is reported to verify the effectiveness of aggregate keyword search and the efficiency of our methods.
Similar content being viewed by others
References
Agrawal S, Chaudhuri S, Das G (2002) DBXplorer: a system for keyword-based search over relational databases. In: Proceedings of the 18th international conference on data engineering (ICDE’02). IEEE Computer Society Washington, DC, USA, pp 5–16
Amer-Yahia S, Case P, Rölleke T, Shanmugasundaram J, Weikum G (2005) Report on the DB/IR panel at sigmod 2005. ACM, New York, NY, USA, vol 34, pp 71–74
Balmin A, Hristidis V, Papakonstantinou Y (2004) Objectrank: authority-based keyword search in databases. In: Proceedings of the thirtieth international conference on very large data bases (VLDB’04). VLDB Endowment, pp 564–575
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cube. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’99). ACM, New York, NY, USA, pp 359–370
Bhalotia G, Hulgeri A, Nakhe C, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using banks. In: Proceedings of the 18th international conference on data engineering (ICDE’02). IEEE Computer Society, pp 431–440
Chaudhuri S, Das G (2009) Keyword querying and ranking in databases. PVLDB 2(2): 1658–1659
Chaudhuri S, Ramakrishnan R, Weikum G (2005) Integrating DB and IR technologies: What is the sound of one hand clapping? In: Proceedings of the 2nd biennial conference on innovative data systems research (CIDR’05), pp 1–12
Chen Y, Wang W, Liu Z, Lin X (2009) Keyword search on structured and semi-structured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’09). ACM, pp 1005–1010
Cormen TH, Leiserson CE, Rivest RL, Stein C (2001) Introduction to algorithms. McGraw-Hill Higher Education, Cambridge
Daoud M, Lechani LT, Boughanem M (2009) Towards a graph-based user profile modeling for a session-based personalized search. Knowl Inf Syst 21(3): 365–398
Ding B, Yu JX, Wang S, Qin L, Zhang X, Lin X (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of the 23rd IEEE international conference on data engineering (ICDE’07). IEEE Computer Society, Washington, DC, USA, pp 836–845
Dreyfus SE, Wagner RA (1972) The steiner problem in graphs. Networks 1: 195–207
Fang M, Shivakumar N, Garcia-Molina H, Motwani R, Ullman JD (1998) Computing iceberg queries efficiently. In: Proceedings of the international conference on very large data bases (VLDB’98). New York, NY, pp 299–310
Fellbaum, C (eds) (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
Feng Y, Agrawal D, Abbadi AE, Metwally A (2004) Range Cube: efficient cube computation by exploiting data correlation. In: Proceedings of the international conference on data engineering (ICDE’04). Boston, MA, pp 658–669
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co, New York
Gong Z, Liu Q (2009) Improving keyword based web image search with visual feature distribution and term expansion. Knowl Inf Syst 21(1): 113–132
Gray J, Bosworth A, Layman A, Pirahesh H (1996) Data cube: a relational operator generalizing group-by, cross-tab and sub-totals. In: Proceedings of the international conference on data engineering (ICDE’96). New Orleans, Louisiana, pp 152–159
Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. In: Proceedings of ACM-SIGMOD international conference on management of data (SIGMOD’01). Santa Barbara, CA, pp 1–12
Harman D, Baeza-Yates R, Fox E, Lee W (1992) Inverted files. In: Information retrieval: data structures and algorithms. Prentice-Hall Inc., Upper Saddle River, NJ, USA, pp 28–43
He H, Wang H, Yang J, Yu PS (2007) Blinks: ranked keyword searches on graphs. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 305–316
Henzinger M, Motwani R, Silverstein C (2003) Challenges in web search engines. In: Proceedings of the 18th international joint conference on artificial intelligence (IJCAI’03), pp 1573–1579
Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases (VLDB’02). Morgan Kaufmann, pp 670–681
Hristidis V, Gravano L, Papakonstantinou Y (2003) Efficient IR-style keyword search over relational databases. In: Proceedings of the 29th international conference on very large data bases (VLDB’03), pp 850–861
Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H (2005) Bidirectional expansion for keyword search on graph databases. In: Proceedings of the 31st international conference on very large data bases (VLDB’05). ACM, pp 505–516
Kimelfeld B, Sagiv Y (2006) Finding and approximating top-k answers in keyword proximity search. In: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS’06). ACM, New York, NY, USA, pp 173–182
Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the ACM SIGMOD international conference on Management of data (SIGMOD’08). ACM, New York, NY, USA, pp 903–914
Liu F, Yu C, Meng W, Chowdhury A (2006) Effective keyword search in relational databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’06). ACM, New York, NY, USA, pp 563–574
Liu Z, Chen Y (2007) Identifying meaningful return information for xml keyword search. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 329–340
Luo Y, Lin X, Wang W, Zhou X (2007) Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 115–126
Ng RT, Wagner AS, Yin Y (2001) Iceberg-cube computation with PC clusters. In: Proceedings of the ACM-SIGMOD international conference management of data (SIGMOD’01). Santa Barbara, CA
Park J, goo Lee S (2010) Keyword search in relational databases. Knowl Inf Syst (Online First). doi:10.1007/s10115-010-0284-1
Qin L, Yu JX, Chang L (2009a) Keyword search in databases: the power of rdbms. In: Proceedings of the 35th SIGMOD international conference on management of data (SIGMOD’09). ACM Press, Providence, Rhode Island, USA, pp 681–694
Qin L, Yu JX, Chang L, Tao Y (2009b) Querying communities in relational databases. In: Proceedings of the 25th international conference on data engineering (ICDE’09). IEEE, pp 724–735
Taha K, Elmasri R (2010) Bussengine: a business search engine. Knowl Inf Syst 23(2): 153–197
Tong H, Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3): 327–346
Vu QH, Ooi BC, Papadias D, Tung AKH (2008) A graph method for keyword-based selection of the top-k databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’08). ACM, New York, NY, USA
Weikum G (2007) DB&IR: both sides now. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 25–30
Wu P, Sismanis Y, Reinwald B (2007) Towards keyword-driven analytical processing. In: Proceedings of the ACM SIGMOD international conference on Management of data (SIGMOD’07). ACM, New York, NY, USA, pp 617–628
Xin D, Han J, Li X, Wah BW (2003) Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of the international conference on very large data bases (VLDB’02). Berlin, Germany, pp 476–487
Yu B, Li G, Sollins K, Tung AKH (2007) Effective keyword-based selection of relational databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’07). ACM, New York, NY, USA, pp 139–150
Zhou B, Pei J (2009) Answering aggregate keyword queries on relational databases using minimal group-bys. In: Proceedings of the 12th international conference on extending database technology (EDBT’09). Saint-Petersburg, Russia
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appears as [42]. The authors are grateful to the anonymous reviewers and the associate editor for their constructive comments that help to improve the quality of the paper. This research is supported in part by an NSERC Discovery Grant, an NSERC Discovery Accelerator Supplement Grant, and a British Columbia Natural Resources and Applied Sciences Endowment Fund. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
Rights and permissions
About this article
Cite this article
Zhou, B., Pei, J. Aggregate keyword search on large relational databases. Knowl Inf Syst 30, 283–318 (2012). https://doi.org/10.1007/s10115-011-0379-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-011-0379-3