Skip to main content
Log in

A graph-theoretic approach to optimize keyword queries in relational databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Keyword search can provide users an easy method to query large and complex databases without any knowledge of structured query languages or underlying database schema. Most of the existing studies have focused on generating candidate structured queries relevant to keywords. Due to the large size of generated queries, the execution costs may be prohibitive. However, existing studies lack the idea of a generalized method to optimize the plan of the large set of generated queries. In this paper, we introduce a graph-theoretic optimization approach. We propose a general graph model, Weighted Operator Graph, to address the costs of keyword query evaluation plans. The proposed model is flexible to integrate all of the cost-based plans in a uniform way. We define a Keyword Query Optimization Problem based on a theoretical cost model as a graph-theoretic problem and show it to be a NP-hard problem. We propose a greedy heuristic Maximum Propagation that reduces the size of the intermediate result as early as possible. The proposed algorithm allows us to achieve efficiency in terms of query evaluation costs. The experimental studies on both synthetic and real data set results show that our work outperforms the existing work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Aditya B, Bhalotia G, Chakrabarti S, Hulgeri A, Nakhe C, Parag P, Sudarshan S (2002) BANKS: browsing and keyword searching in relational databases. In: Proceedings of the international conference on very large data bases (VLDB ’02), pp 1083–1086

  2. Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’08), pp 903–914

  3. Agrawal S, Chaudhuri S, Das G (2002) DBXplorer: a system for keyword-based search over relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’02), pp 5–16

  4. Hristidis V, Gravano L, Papakonstantinou Y (2003) Efficient IR-style keyword search over relational databases. In: Proceedings of the international conference on very large data bases (VLDB ’03), pp 850–861

  5. Hristidis V, Papakonstantinou Y (2002) DISCOVER: keyword search in relational databases. In: Proceedings of the international conference on very large data bases (VLDB ’02), pp 670–681

  6. Luo Y, Wang W, Lin X (2008) SPARK: a keyword search engine on relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’08), pp 1552–1555

  7. Pu KQ, Yu X (2008) Keyword query cleaning. Proc PVLDB 1(1):909–920

    MathSciNet  Google Scholar 

  8. Tao Y, Jeffrey XY (2009). Finding frequent co-occurring terms in relational keyword search. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT ’09), pp 839–850

  9. Koutrika G, Mohammadi Zadeh Z, Garcia-Molina H (2009) Data clouds: summarizing keyword search results over structured data. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT ’09), pp 391–402

  10. Markowetz A, Yang Y, Papadias D (2007) Keyword search on relational data streams. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 605–616

  11. Simitsis A, Koutrika G, Ioannidis Y (2008) Précis: from unstructured keywords as queries to structured databases as answers. VLDB J 17(1):117–149

    Article  Google Scholar 

  12. Qin L, Yu JX, Chang L (2011) Scalable keyword search on large data streams. VLDB J 20(1):35–57

    Article  Google Scholar 

  13. Qin L, Yu JX, Chang L (2010) Ten thousand SQLs: parallel keyword queries computing. Proc PVLDB 3(1–2):58–69

    Google Scholar 

  14. Salton G, McGill MJ (1986) Introduction to modern information retrieval. McGraw-Hill, Inc., New York, NY

    Google Scholar 

  15. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York, NY

    Book  MATH  Google Scholar 

  16. Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7(1):48–50

    Article  MATH  MathSciNet  Google Scholar 

  17. Karger DR, Klein PN, Tarjan RE (1995) A randomized linear-time algorithm to find minimum spanning trees. J ACM 42(2):321–328

    Article  MATH  MathSciNet  Google Scholar 

  18. Roy P, Seshadri S, Sudarshan S, Bhobe S (2000) Efficient and extensible algorithms for multi query optimization. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’00), pp 249–260

  19. Park J, Lee S (2011) Keyword search in relational databases. Knowl Inf Syst 26(2):175–193

    Article  Google Scholar 

  20. Markowetz A, Yang Y, Papadias D (2009) Reachability indexes for relational keyword search. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’09), pp 1163–1166

  21. Ding B, Xu Yu J, Wang S, Qin L (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’07), pp 836–845

  22. Demidova E, Zhou X, Zenz G, Nejdl W (2009) SUITS: faceted user interface for constructing structured queries from keywords. In: Proceedings of the international conference on database systems for advanced applications (DASFAA ’09), pp 772–775

  23. Li G, Zhou X, Feng J, Wang J (2009) Progressive keyword search in relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’09), pp 1183–1186

  24. Qin L, Yu JX, Chang L (2009) Keyword search in databases: the power of RDBMS. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’09), pp 681–694

  25. Sayyadian M, Le khac H, Doan A, Gravano L (2007) Efficient keywords search across heterogeneous relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’07), pp 348–355

  26. Tata S, Lohman GM (2008) SQAK: doing more with keywords. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’08), pp 889–902

  27. Ganti V, He Y, Xin D (2010) Keyword++: a framework to improve keyword search over entity databases. Proc PVLDB 3(1–2):711–722

    Google Scholar 

  28. Markowetz A, Yang Y, Papadias D (2009) Keyword search over relational tables and streams. ACM Trans Database Syst 34(3):1–51, Article 17

    Google Scholar 

  29. Qin L, Yu JX, Chang L, Tao Y (2009) Querying communities in relational databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’09), pp 724–735

  30. Kimelfeld B, Sagiv Y (2006) Finding and approximating top-k answers in keyword proximity search. In: Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS ’06), pp 173–182

  31. He H, Wang H, Yang J, Yu PS (2007) BLINKS: ranked keyword searches on graphs. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’07), pp 305–316

  32. Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H (2005) Bidirectional expansion for keyword search on graph databases. In: Proceedings of the international conference on very large data bases (VLDB ’05), pp 505–516

  33. Luo Y, Wang W, Lin X, Zhou X, Wang J, Li K (2011) SPARK2: top-k keyword query in relational databases. IEEE Trans Knowl Data Eng 23(12):1763–1780

    Article  Google Scholar 

  34. Zhou B, Pei J (2009) Answering aggregate keyword queries on relational databases using minimal group-bys. In: Proceedings of the international conference on extending database technology: advances in database technology (EDBT ’09), pp 108–119

  35. Stefanidis K, Drosou M, Pitoura E (2010) PerK: personalized keyword search in relational databases through preferences. In: Proceedings of the international conference on extending database technology (EDBT ’10), pp 585–596

  36. Li G, Ji S, Li C, Feng J (2009) Efficient type-ahead search on relational data: a TASTIER approach. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD ’09), pp 695–706

  37. Nambiar U, Kambhampati S (2006) Answering imprecise queries over autonomous web databases. In: Proceedings of the IEEE ICDE international conference on data engineering (ICDE ’06), pp 45–55

  38. Rosenthal A, Chakravarthy US (1988) Anatomy of a modular multiple query optimizer. In: Proceedings of the international conference on very large databases (VLDB’ 88), pp 230–239

  39. Sellis TK (1988) Multiple-query optimization. ACM Trans Database Syst 13(1):23–52

    Article  Google Scholar 

Download references

Acknowledgments

This research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea, in the ICT R&D Program 2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaehui Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, J., Lee, Sg. A graph-theoretic approach to optimize keyword queries in relational databases. Knowl Inf Syst 41, 843–870 (2014). https://doi.org/10.1007/s10115-013-0690-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0690-2

Keywords

Navigation