Abstract
Providing top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.











Similar content being viewed by others
References
Aditya B, Bhalotia G, Chakrabarti S (2002) Banks: browsing and keyword searching in relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 1083–1086
Agrawal R, Rantzau R, Terzi E (2006) Context-sensitive ranking. In: Proceedings of the ACM SIGMOD Conference. ACM, Chicago, pp 383–394
Agrawal S, Chaudhuri S, Das G (2002) Dbxplorer: a system for keyword-based search over relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 5–16
AlSumait L, Domeniconi C (2008) Text clustering with local semantic kernels. In: Berry M, Castellanos M (eds) Survey of text mining II. Springer, London, pp 87–105
Bao Z-F, Lu J-H, Ling T-W (2010) Xreal: an interactive xml keyword searching. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, Toronto, pp 1933–1934
Bergamaschi S, Domnori E, Guerra F (2011) Keyword search over relational databases: a metadata approach. In: Proceedings of the ACM SIGMOD Conference. ACM, Athens, pp 565–576
Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International World Wide Web Conference. ACM, Banff, pp 757–786
Boldi P, Bonchi F, Castillo C et al (2009) Query suggestions using query flow graphs. In: Proceedings of the ACM Workshop on web Search Click Data. ACM, Barcelona, pp 56–63
Billhardt H, Borrajo D, Maojo V (1990) A context vector model for information retrieval. J Am Soci Inf Sci 41(6):391–407
Cao L-B, Ou Y-M, Yu P-S (2012) Coupled behavior analysis with applications. IEEE Trans Knowl Data Eng 24(8):1378–1392
Chen Z-Y, Li T (2007) Addressing diverse user preferences in sql-query-result navigation. In: Proceedings of the ACM SIGMOD Conference. ACM, Beijing, pp 641–652
Cheng X, Miao D-Q, Wang C et al (2013) Coupled term-term relation analysis for document clustering. In: Proceedings of the international joint conference on neural networks. IEEE, Dallas, pp 1–8
Cao G, Nie J, Bai J (2005) Integrating word relationships into language models. In: Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Salvador, pp 298–305
Das G, Gunopulos D, Koudas N (2006) Answering top-k queries using views. In: Proceedings of the 32nd international conference on very large data bases. ACM, Seoul, pp 451–462
Ding B, Yu J-X, Wang S (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of the 23rd international conference on data engineering. IEEE, Istanbul, pp 468–477
Deerwester S, Dumais S, Furnas G et al (1990) Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6):391–407
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. J Comput Syst Sci 66(4):614–656
Guisado-Gamez J, Prat-Perez A (2015) Understanding graph structure of Wikipedia for query expansion. In: Proceedings of the ACM SIGMOD international workshop on graph data management experiences and systems. ACM, Melbourne, pp 1–6
Hristidis V, Gravano L, Papakonstantinou Y (2003) Efficient ir-style keyword search over relational databases. In: Proceedings of the 29th international conference on very large data bases. ACM, Berlin, pp 850–861
Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: Proceedings of the 28th international conference on very large data bases. ACM, Hong Kong, pp 670–681
Huang A, Milne D, Frank E (2009) Clustering documents using a Wikipedia-based concept representation. In: Theeramunkong T, Kijsirikul B, Cercone N, HoAdvances T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 628–636
Hua M, Pei J, Fu A-W-C et al (2009) Top-k typicality queries and efficient query answering methods on large databases. VLDB J 18:809–835
Kong L-B, Gilleron R, Lemay A (2009) Retrieving meaningful relaxed tightest fragments for xml keyword search. In: Proceedings of the 12th international conference on extending database technology. ACM, Saint-Petersburg, pp 815–826
Luo Y, Lin X-M, Wang W (2007) Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD Conference. ACM, Beijing, pp 305-316
Li G-L, Feng J-Y, Zhou L-Z (2008) Retune: retrieving and materializing tuple units for effective keyword search over relational databases. In: Proceedings of the ER Conference. Springer, Barcelona, pp 469–483
Qumsiyeh R, Ng Y-K (2014) Assisting web search using query suggestion based on word similarity measure and query modification patterns. J World Wide Web 17(5):1141–1160
Sarkas N, Bansal N, Bansal G (2009) Measure-driven keyword query expansion. In: Proceedings of the 35th international conference on very large data bases. ACM, Lyon, pp 121–132
Scott D-W, Sain S-R (2004) Multi-dimensional density estimation. In: Rao CR, Wegman EJ, Solka JL (eds) Handbook of statistics: data mining and data visualization. Elsevier, North Holland, pp 229–261
Tata S, Lohman G-M (2008) Sqak: doing more with keywords. In: Proceedings of the 34th international conference on very large data bases. ACM, Auckland, pp 889–902
Wang C, Cao L-B, Wang M-C (2011) Coupled nominal similarity in unsupervised learning. In: Proceedings of the ACM international conference on information and knowledge management. ACM, Glasgow, pp 973–978
Wang C, She Z, Cao L-B (2013) Coupled clustering ensemble: incorporating coupling relationships both between base clusterings and objects. In: Proceedings of the international conference on data engineering. IEEE, Brisbane, pp 374–385
Wang X, Sukthankar G (2013) Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Chicago, pp 464–472
Wong S, Ziarko W, Wong P (1985) Generalized vector spaces model in information retrieval. In: Proceedings of the 8th annual international ACM SIGIR conference on research and development in information retrieval. ACM, Montreal, pp 18–25
Yao J-J, Cui B, Hua L-S (2012) Keyword query reformulation on structured data. In: Proceedings of the 28th international conference on data engineering. IEEE, Arlington, pp 953–964
Yu A, Agarwal P-K, Yang J (2014) Top-k preferences in high dimensions. In: Proceedings of the 30th international conference on data engineering. IEEE, Chicago, pp 748–759
Zhou R, Liu C-F, Li J-X (2010) Fast elca computation for keyword queries on xml data. In: Proceedings of the 13th international conference on extending database technology. Lausanne, pp 549--560
Acknowledgments
This work is supported by the National Science Foundation for Young Scientists of China (No. 61003162) and the Young Scholars Growth Plan of Liaoning (No. LJQ2013038).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Meng, X., Cao, L., Zhang, X. et al. Top-k coupled keyword recommendation for relational keyword queries. Knowl Inf Syst 50, 883–916 (2017). https://doi.org/10.1007/s10115-016-0959-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0959-3