Skip to main content
Log in

Top-k coupled keyword recommendation for relational keyword queries

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Aditya B, Bhalotia G, Chakrabarti S (2002) Banks: browsing and keyword searching in relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 1083–1086

  2. Agrawal R, Rantzau R, Terzi E (2006) Context-sensitive ranking. In: Proceedings of the ACM SIGMOD Conference. ACM, Chicago, pp 383–394

  3. Agrawal S, Chaudhuri S, Das G (2002) Dbxplorer: a system for keyword-based search over relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 5–16

  4. AlSumait L, Domeniconi C (2008) Text clustering with local semantic kernels. In: Berry M, Castellanos M (eds) Survey of text mining II. Springer, London, pp 87–105

    Chapter  Google Scholar 

  5. Bao Z-F, Lu J-H, Ling T-W (2010) Xreal: an interactive xml keyword searching. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, Toronto, pp 1933–1934

  6. Bergamaschi S, Domnori E, Guerra F (2011) Keyword search over relational databases: a metadata approach. In: Proceedings of the ACM SIGMOD Conference. ACM, Athens, pp 565–576

  7. Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International World Wide Web Conference. ACM, Banff, pp 757–786

  8. Boldi P, Bonchi F, Castillo C et al (2009) Query suggestions using query flow graphs. In: Proceedings of the ACM Workshop on web Search Click Data. ACM, Barcelona, pp 56–63

  9. Billhardt H, Borrajo D, Maojo V (1990) A context vector model for information retrieval. J Am Soci Inf Sci 41(6):391–407

    Article  Google Scholar 

  10. Cao L-B, Ou Y-M, Yu P-S (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392

    Article  Google Scholar 

  11. Chen Z-Y, Li T (2007) Addressing diverse user preferences in sql-query-result navigation. In: Proceedings of the ACM SIGMOD Conference. ACM, Beijing, pp 641–652

  12. Cheng X, Miao D-Q, Wang C et al (2013) Coupled term-term relation analysis for document clustering. In: Proceedings of the international joint conference on neural networks. IEEE, Dallas, pp 1–8

  13. Cao G, Nie J, Bai J (2005) Integrating word relationships into language models. In: Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Salvador, pp 298–305

  14. Das G, Gunopulos D, Koudas N (2006) Answering top-k queries using views. In: Proceedings of the 32nd international conference on very large data bases. ACM, Seoul, pp 451–462

  15. Ding B, Yu J-X, Wang S (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of the 23rd international conference on data engineering. IEEE, Istanbul, pp 468–477

  16. Deerwester S, Dumais S, Furnas G et al (1990) Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6):391–407

    Article  Google Scholar 

  17. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656

    Article  MathSciNet  MATH  Google Scholar 

  18. Guisado-Gamez J, Prat-Perez A (2015) Understanding graph structure of Wikipedia for query expansion. In: Proceedings of the ACM SIGMOD international workshop on graph data management experiences and systems. ACM, Melbourne, pp 1–6

  19. Hristidis V, Gravano L, Papakonstantinou Y (2003) Efficient ir-style keyword search over relational databases. In: Proceedings of the 29th international conference on very large data bases. ACM, Berlin, pp 850–861

  20. Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 670–681

  21. Huang A, Milne D, Frank E (2009) Clustering documents using a Wikipedia-based concept representation. In: Theeramunkong T, Kijsirikul B, Cercone N, HoAdvances T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 628–636

    Chapter  Google Scholar 

  22. Hua M, Pei J, Fu A-W-C et al (2009) Top-k typicality queries and efficient query answering methods on large databases. VLDB J 18:809–835

    Article  Google Scholar 

  23. Kong L-B, Gilleron R, Lemay A (2009) Retrieving meaningful relaxed tightest fragments for xml keyword search. In: Proceedings of the 12th international conference on extending database technology. ACM, Saint-Petersburg, pp 815–826

  24. Luo Y, Lin X-M, Wang W (2007) Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD Conference. ACM, Beijing, pp 305-316

  25. Li G-L, Feng J-Y, Zhou L-Z (2008) Retune: retrieving and materializing tuple units for effective keyword search over relational databases. In: Proceedings of the ER Conference. Springer, Barcelona, pp 469–483

  26. Qumsiyeh R, Ng Y-K (2014) Assisting web search using query suggestion based on word similarity measure and query modification patterns. J World Wide Web 17(5):1141–1160

    Article  Google Scholar 

  27. Sarkas N, Bansal N, Bansal G (2009) Measure-driven keyword query expansion. In: Proceedings of the 35th international conference on very large data bases. ACM, Lyon, pp 121–132

  28. Scott D-W, Sain S-R (2004) Multi-dimensional density estimation. In: Rao CR, Wegman EJ, Solka JL (eds) Handbook of statistics: data mining and data visualization. Elsevier, North Holland, pp 229–261

    Google Scholar 

  29. Tata S, Lohman G-M (2008) Sqak: doing more with keywords. In: Proceedings of the 34th international conference on very large data bases. ACM, Auckland, pp 889–902

  30. Wang C, Cao L-B, Wang M-C (2011) Coupled nominal similarity in unsupervised learning. In: Proceedings of the ACM international conference on information and knowledge management. ACM, Glasgow, pp 973–978

  31. Wang C, She Z, Cao L-B (2013) Coupled clustering ensemble: incorporating coupling relationships both between base clusterings and objects. In: Proceedings of the international conference on data engineering. IEEE, Brisbane, pp 374–385

  32. Wang X, Sukthankar G (2013) Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Chicago, pp 464–472

  33. Wong S, Ziarko W, Wong P (1985) Generalized vector spaces model in information retrieval. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval. ACM, Montreal, pp 18–25

  34. Yao J-J, Cui B, Hua L-S (2012) Keyword query reformulation on structured data. In: Proceedings of the 28th international conference on data engineering. IEEE, Arlington, pp 953–964

  35. Yu A, Agarwal P-K, Yang J (2014) Top-k preferences in high dimensions. In: Proceedings of the 30th international conference on data engineering. IEEE, Chicago, pp 748–759

  36. Zhou R, Liu C-F, Li J-X (2010) Fast elca computation for keyword queries on xml data. In: Proceedings of the 13th international conference on extending database technology. Lausanne, pp 549--560

Download references

Acknowledgments

This work is supported by the National Science Foundation for Young Scientists of China (No. 61003162) and the Young Scholars Growth Plan of Liaoning (No. LJQ2013038).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangfu Meng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meng, X., Cao, L., Zhang, X. et al. Top-k coupled keyword recommendation for relational keyword queries. Knowl Inf Syst 50, 883–916 (2017). https://doi.org/10.1007/s10115-016-0959-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0959-3

Keywords

Navigation