Abstract
Keyword search can provide users an easy method to query large and complex databases without any knowledge of structured query languages or underlying database schema. Most of the existing studies have focused on generating candidate structured queries relevant to keywords. Due to the large size of generated queries, the execution costs may be prohibitive. However, existing studies lack the idea of a generalized method to optimize the plan of the large set of generated queries. In this paper, we introduce a graph-theoretic optimization approach. We propose a general graph model, Weighted Operator Graph, to address the costs of keyword query evaluation plans. The proposed model is flexible to integrate all of the cost-based plans in a uniform way. We define a Keyword Query Optimization Problem based on a theoretical cost model as a graph-theoretic problem and show it to be a NP-hard problem. We propose a greedy heuristic Maximum Propagation that reduces the size of the intermediate result as early as possible. The proposed algorithm allows us to achieve efficiency in terms of query evaluation costs. The experimental studies on both synthetic and real data set results show that our work outperforms the existing work.
Access this article
Rent this article via DeepDyve
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig10_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig11_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig12_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig13_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig14_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10115-013-0690-2/MediaObjects/10115_2013_690_Fig15_HTML.gif)
Similar content being viewed by others
References
Aditya B, Bhalotia G, Chakrabarti S, Hulgeri A, Nakhe C, Parag P, Sudarshan S (2002) BANKS: browsing and keyword searching in relational databases. In: Proceedings of the international conference on very large data bases (VLDB ’02), pp 1083–1086
Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’08), pp 903–914
Agrawal S, Chaudhuri S, Das G (2002) DBXplorer: a system for keyword-based search over relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’02), pp 5–16
Hristidis V, Gravano L, Papakonstantinou Y (2003) Efficient IR-style keyword search over relational databases. In: Proceedings of the international conference on very large data bases (VLDB ’03), pp 850–861
Hristidis V, Papakonstantinou Y (2002) DISCOVER: keyword search in relational databases. In: Proceedings of the international conference on very large data bases (VLDB ’02), pp 670–681
Luo Y, Wang W, Lin X (2008) SPARK: a keyword search engine on relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’08), pp 1552–1555
Pu KQ, Yu X (2008) Keyword query cleaning. Proc PVLDB 1(1):909–920
Tao Y, Jeffrey XY (2009). Finding frequent co-occurring terms in relational keyword search. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT ’09), pp 839–850
Koutrika G, Mohammadi Zadeh Z, Garcia-Molina H (2009) Data clouds: summarizing keyword search results over structured data. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT ’09), pp 391–402
Markowetz A, Yang Y, Papadias D (2007) Keyword search on relational data streams. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 605–616
Simitsis A, Koutrika G, Ioannidis Y (2008) Précis: from unstructured keywords as queries to structured databases as answers. VLDB J 17(1):117–149
Qin L, Yu JX, Chang L (2011) Scalable keyword search on large data streams. VLDB J 20(1):35–57
Qin L, Yu JX, Chang L (2010) Ten thousand SQLs: parallel keyword queries computing. Proc PVLDB 3(1–2):58–69
Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, Inc., New York, NY
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York, NY
Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7(1):48–50
Karger DR, Klein PN, Tarjan RE (1995) A randomized linear-time algorithm to find minimum spanning trees. J ACM 42(2):321–328
Roy P, Seshadri S, Sudarshan S, Bhobe S (2000) Efficient and extensible algorithms for multi query optimization. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’00), pp 249–260
Park J, Lee S (2011) Keyword search in relational databases. Knowl Inf Syst 26(2):175–193
Markowetz A, Yang Y, Papadias D (2009) Reachability indexes for relational keyword search. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’09), pp 1163–1166
Ding B, Xu Yu J, Wang S, Qin L (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’07), pp 836–845
Demidova E, Zhou X, Zenz G, Nejdl W (2009) SUITS: faceted user interface for constructing structured queries from keywords. In: Proceedings of the international conference on database systems for advanced applications (DASFAA ’09), pp 772–775
Li G, Zhou X, Feng J, Wang J (2009) Progressive keyword search in relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’09), pp 1183–1186
Qin L, Yu JX, Chang L (2009) Keyword search in databases: the power of RDBMS. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’09), pp 681–694
Sayyadian M, Le khac H, Doan A, Gravano L (2007) Efficient keywords search across heterogeneous relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’07), pp 348–355
Tata S, Lohman GM (2008) SQAK: doing more with keywords. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’08), pp 889–902
Ganti V, He Y, Xin D (2010) Keyword++: a framework to improve keyword search over entity databases. Proc PVLDB 3(1–2):711–722
Markowetz A, Yang Y, Papadias D (2009) Keyword search over relational tables and streams. ACM Trans Database Syst 34(3):1–51, Article 17
Qin L, Yu JX, Chang L, Tao Y (2009) Querying communities in relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’09), pp 724–735
Kimelfeld B, Sagiv Y (2006) Finding and approximating top-k answers in keyword proximity search. In: Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’06), pp 173–182
He H, Wang H, Yang J, Yu PS (2007) BLINKS: ranked keyword searches on graphs. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 305–316
Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H (2005) Bidirectional expansion for keyword search on graph databases. In: Proceedings of the international conference on very large data bases (VLDB ’05), pp 505–516
Luo Y, Wang W, Lin X, Zhou X, Wang J, Li K (2011) SPARK2: top-k keyword query in relational databases. IEEE Trans Knowl Data Eng 23(12):1763–1780
Zhou B, Pei J (2009) Answering aggregate keyword queries on relational databases using minimal group-bys. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT ’09), pp 108–119
Stefanidis K, Drosou M, Pitoura E (2010) PerK: personalized keyword search in relational databases through preferences. In: Proceedings of the international conference on extending database technology (EDBT ’10), pp 585–596
Li G, Ji S, Li C, Feng J (2009) Efficient type-ahead search on relational data: a TASTIER approach. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’09), pp 695–706
Nambiar U, Kambhampati S (2006) Answering imprecise queries over autonomous web databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’06), pp 45–55
Rosenthal A, Chakravarthy US (1988) Anatomy of a modular multiple query optimizer. In: Proceedings of the international conference on very large databases (VLDB’ 88), pp 230–239
Sellis TK (1988) Multiple-query optimization. ACM Trans Database Syst 13(1):23–52
Acknowledgments
This research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea, in the ICT R&D Program 2013.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, J., Lee, Sg. A graph-theoretic approach to optimize keyword queries in relational databases. Knowl Inf Syst 41, 843–870 (2014). https://doi.org/10.1007/s10115-013-0690-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0690-2