Abstract
Recognizing that two SQL queries are similar is useful for many applications, such as query recommendation, plan selection and so on. However,questions such as which techniques are needed and which SQL query representation is best to produce accurate similarity estimation remain poorly addressed.
In this work we explore two SQL queries representations proposed in the literature, and study how SVM is accurate to predict SQL queries’ similarity using these representations. We use RBF and polynomial kernels to build SVM models. As an additional contribution, we compute a personnalized kernel and compare it against kernels cited above. Results show that one of the studied representations gives better results than the other, and that our proposed kernel is comparable to RBF kernel in terms of accuracy.
An erratum to this chapter is available at 10.1007/978-3-319-26561-2_81.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-26561-2_81
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Antara, G., Jignashu, P., Vibhuti S., et al.: Plan selection based on query clustering. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, pp. 179–190 (2002)
Sadeg, L., Bellatreche, L.: Approach to query optimization led by the reuse of execution plans. Ph.D. thesis. ESI (2010)
Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. In: Zhou, L., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 851–862. Springer, Heidelberg (2005)
Kostas, S., Marina, D., Pitoura, E.: You may also like results in relational databases. In: Proceedings of PersDB, vol. 9 (2009)
Aligon, J., Golfarelli, M., Marcel, P., Rizzi, S., Turricchia, E.: Mining preferences from OLAP query logs for proactive personalization. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 84–97. Springer, Heidelberg (2011)
Yang, X., Procopiuc, C.M., Srivastava, D.: Recommending join queries via query log analysis. In: IEEE 25th International Conference on Data Engineering, ICDE 2009. IEEE, pp. 964–975 (2009)
Gupta, A., Mumick, I.S., et al. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT press, Cambridge (1999)
Chatzopoulou, G., Eirinaki, M., Koshy, S., et al.: The QueRIE system for personalized query recommendations. IEEE Data Eng. Bull. 34(2), 55–60 (2011)
Cordella, L.P., Foggia, P., Sansone, C., et al.: Performance evaluation of the VF graph matching algorithm. In: Proceedings International Conference on IEEE of Image Analysis and Processing, pp. 1172–1177 (1999)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, New York (2000)
Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Minh, H.Q., Niyogi, P., Yao, Y.: Mercer’s theorem, feature maps, and smoothing. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 154–168. Springer, Heidelberg (2006)
Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? ACM SIGKDD Explor. Newslett. 2(2), 1–13 (2000)
Herbrich, R.: Learning Kernel Classifiers. MIT Press, Cambridge (2002)
Hastie, T., Tibshirani, R., Friedman, J., et al.: The Elements of Statistical Learning. Springer, New York (2009)
Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)
APB-1 olap benchmark. http://www.olapcouncil.org/research/spec1.htm
Hall, M., Frank, E., Holmes, G., et al.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Platt, J., et al.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methodssupport vector learning, vol. 3 (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zahir, J., El Qadi, A., Bellatreche, L. (2015). Identifying SQL Queries Similarity Using SVM. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9492. Springer, Cham. https://doi.org/10.1007/978-3-319-26561-2_77
Download citation
DOI: https://doi.org/10.1007/978-3-319-26561-2_77
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26560-5
Online ISBN: 978-3-319-26561-2
eBook Packages: Computer ScienceComputer Science (R0)