Skip to main content

Identifying SQL Queries Similarity Using SVM

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9492))

Included in the following conference series:

Abstract

Recognizing that two SQL queries are similar is useful for many applications, such as query recommendation, plan selection and so on. However,questions such as which techniques are needed and which SQL query representation is best to produce accurate similarity estimation remain poorly addressed.

In this work we explore two SQL queries representations proposed in the literature, and study how SVM is accurate to predict SQL queries’ similarity using these representations. We use RBF and polynomial kernels to build SVM models. As an additional contribution, we compute a personnalized kernel and compare it against kernels cited above. Results show that one of the studied representations gives better results than the other, and that our proposed kernel is comparable to RBF kernel in terms of accuracy.

An erratum to this chapter is available at 10.1007/978-3-319-26561-2_81.

An erratum to this chapter can be found at http://dx.doi.org/10.1007/978-3-319-26561-2_81

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Antara, G., Jignashu, P., Vibhuti S., et al.: Plan selection based on query clustering. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB Endowment, pp. 179–190 (2002)

    Google Scholar 

  2. Sadeg, L., Bellatreche, L.: Approach to query optimization led by the reuse of execution plans. Ph.D. thesis. ESI (2010)

    Google Scholar 

  3. Yao, Q., An, A., Huang, X.: Finding and analyzing database user sessions. In: Zhou, L., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 851–862. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Kostas, S., Marina, D., Pitoura, E.: You may also like results in relational databases. In: Proceedings of PersDB, vol. 9 (2009)

    Google Scholar 

  5. Aligon, J., Golfarelli, M., Marcel, P., Rizzi, S., Turricchia, E.: Mining preferences from OLAP query logs for proactive personalization. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 84–97. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Yang, X., Procopiuc, C.M., Srivastava, D.: Recommending join queries via query log analysis. In: IEEE 25th International Conference on Data Engineering, ICDE 2009. IEEE, pp. 964–975 (2009)

    Google Scholar 

  7. Gupta, A., Mumick, I.S., et al. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT press, Cambridge (1999)

    Google Scholar 

  8. Chatzopoulou, G., Eirinaki, M., Koshy, S., et al.: The QueRIE system for personalized query recommendations. IEEE Data Eng. Bull. 34(2), 55–60 (2011)

    Google Scholar 

  9. Cordella, L.P., Foggia, P., Sansone, C., et al.: Performance evaluation of the VF graph matching algorithm. In: Proceedings International Conference on IEEE of Image Analysis and Processing, pp. 1172–1177 (1999)

    Google Scholar 

  10. Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, New York (2000)

    Book  MATH  Google Scholar 

  11. Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  12. Minh, H.Q., Niyogi, P., Yao, Y.: Mercer’s theorem, feature maps, and smoothing. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 154–168. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? ACM SIGKDD Explor. Newslett. 2(2), 1–13 (2000)

    Article  Google Scholar 

  14. Herbrich, R.: Learning Kernel Classifiers. MIT Press, Cambridge (2002)

    MATH  Google Scholar 

  15. Hastie, T., Tibshirani, R., Friedman, J., et al.: The Elements of Statistical Learning. Springer, New York (2009)

    Book  MATH  Google Scholar 

  16. Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)

    Google Scholar 

  17. APB-1 olap benchmark. http://www.olapcouncil.org/research/spec1.htm

  18. Hall, M., Frank, E., Holmes, G., et al.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)

    Article  Google Scholar 

  19. Platt, J., et al.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methodssupport vector learning, vol. 3 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jihad Zahir .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zahir, J., El Qadi, A., Bellatreche, L. (2015). Identifying SQL Queries Similarity Using SVM. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9492. Springer, Cham. https://doi.org/10.1007/978-3-319-26561-2_77

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26561-2_77

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26560-5

  • Online ISBN: 978-3-319-26561-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics