Skip to main content
Log in

HQLgen: deep learning based HQL query generation from program context

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

To facilitate Object-Oriented Programming (OOP) in data persistence, practitioners use Object Relational-Mapping (ORM) framework to map data bidirectionally between data classes and tables of Relational Database Management System (RDBMS). In terms of Java applications, the most trending ORM solution is Hibernate, where Hibernate Query Language (HQL) is proposed to perform customizable queries in an OOP style. However, HQL queries are hard to implement and maintain due to their flexibility and complexity. To address these issues, we propose a model called HQLgen that combines deep learning and template to automatically generate HQL queries from program context. It employs recurrent neural network to learn the contextual information of Java program, and predicts the key elements within HQL clauses via attention mechanism. To construct the dataset for model training and evaluation, we locate and extract projects containing HQL queries on GitHub followed by extensive cleaning and preprocessing, and finally obtain 24,118 HQL queries from 3,481 projects. Experimental results show that the proposed approach achieves an accuracy of 34.52% on predicting simple HQL queries. In addition, we release the collected dataset for future research interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.javaguides.net/2018/11/spring-data-jpa-query-creation-from-method-names.html.

  2. https://www.baeldung.com/querydsl-with-jpa-tutorial.

  3. http://boa.cs.iastate.edu/boa/index.php?q=boa/job/93375.

  4. http://javaparser.org.

  5. https://github.com/zy-zhou/HQLgen.

References

  • Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases - an introduction. Nat. Lang. Eng. 1(1), 29–81 (1995)

    Article  Google Scholar 

  • Androutsopoulos, I., Ritchie, G., Thanisch, P.: Masque/sql: An efficient and portable natural language query interface for relational databases. In: Proceedings of the 6th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems. IEA/AIE’93, pp. 327–330. Gordon amp; Breach Science Publishers (1993)

  • Bogin, B., Gardner, M., Berant, J.: Global reasoning over database structures for text-to-sql parsing. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019, pp. 3657–3662. Association for Computational Linguistics (2019)

  • Chen, T.-H., Shang, W., Hassan, A.E., Nasser, M.N., Flora, P.: Cacheoptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), pp. 666–677 (2016a)

  • Chen, T.-H., Shang, W., Yang, J., Hassan, A.E., Godfrey, M.W., Nasser,M., Flora, P.: An empirical study on the practice of maintaining object-relational mapping code in Java systems. In: Proceedings of the 13th International Conference on Mining Software Repositories (MSR), pp. 165–176 (2016b)

  • Cho, K., van Merrienboer, B., Gu¨l¸cehre, C¸., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A Meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1724–1734. ACL (2014)

  • Cook, W.R., Greene, R., Linskey, P., Meijer, E., Rugg, K., Russell, C., Walker, B., Wittig, C.: Objects and databases: State of the union in 2006. In: Companion to the 21st ACM SIGPLAN Symposium on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pp. 926–928 (2006)

  • Di Giacomo, M.: MySQL: lessons learned on a digital library. IEEE Softw. 22(3), 10–13 (2005)

    Article  Google Scholar 

  • Dyer, R., Nguyen, H.A., Rajan, H., Nguyen, T.N.: Boa: ultra-large-scale software repository and source-code mining. ACM Trans. Softw. Eng. Methodol. 25(1), 1–34 (2015)

    Article  Google Scholar 

  • Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, D.M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13–15, 2010. JMLR Proceedings, vol. 9, pp. 249–256. JMLR.org (2010)

  • Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  • Iyer, S., Konstas, I., Cheung, A., Krishnamurthy, J., Zettlemoyer, L.: Learning a neural semantic parser from user feedback. In: Barzilay, R., Kan, M. (eds.) Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, vol. 1: Long Papers, pp. 963–973. Association for Computational Linguistics (2017)

  • Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015)

  • Li, R., Liang, P., Soliman, M., Avgeriou, P.: Understanding architecture erosion: The practitioners’ perceptive. In: Proceedings of the IEEE/ACM 29th International Conference on Program Comprehension (ICPC), pp. 311–322 (2021)

  • Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proc. VLDB Endow. 8(1), 73–84 (2014)

    Article  Google Scholar 

  • Loli, S., Teixeira, L., Cartaxo, B.: A catalog of object-relational mapping code smells for Java. In: Proceedings of the 34th Brazilian Symposium on Software Engineering (SBES), pp. 82–91 (2020)

  • Meurice, L., Nagy, C., Cleve, A.: Detecting and preventing program inconsistencies under database schema evolution. In: Proceedings of the IEEE 16th International Conference on Software Quality, Reliability and Security (QRS), pp. 262–273 (2016)

  • Nagy, C., Cleve, A.: Mining stack overflow for discovering error patterns in SQL queries. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 516–520 (2015)

  • Nagy, C., Cleve, A.: A static code smell detector for SQL queries embedded in Java code. In: Proceedings of the IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 147–152 (2017)

  • Nagy, C., Meurice, L., Cleve, A.: Where was this SQL query executed? A static concept location approach. In: Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 580–584 (2015)

  • Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Fu¨rnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21–24, 2010, Haifa, Israel, pp. 807–814. Omnipress (2010)

  • Nazario, M.F.C., Guerra, E., Bonifacio, R., Pinto, G.: Detecting and reporting object-relational mapping problems: an industrial report. In: Proceedings of ACM/IEEE 13th International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 1–6 (2019)

  • Popescu, A., Etzioni, O., Kautz, H.A.: Towards a theory of natural language interfaces to databases. In: Leake, D.B., Johnson, W.L., Andr´e, E. (eds.) Proceedings of the 8th International Conference on Intelligent User Interfaces, IUI 2003, Miami, FL, USA, January 12–15, 2003, pp. 149–157. ACM (2003)

  • Presler-Marshall, K., Heckman, S., Stolee, K.: SQLRepair: Identifying and repairing mistakes in student-authored SQL queries. In: Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET), pp. 199–210 (2021)

  • Procaccianti, G., Lago, P., Diesveld, W.: Energy efficiency of ORM approaches: an empirical evaluation. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pp. 36–13610 (2016)

  • Silva, T.M., Serey, D., de Figueiredo, J.C.A., Brunet, J.: Automated design tests to check hibernate design recommendations. In: Proceedings of the 33rd Brazilian Symposium on Software Engineering (SBES), pp. 94–103 (2019)

  • Singh, R., Bezemer, C., Shang, W., Hassan, A.E.: Optimizing the performance-related configurations of object-relational mapping frameworks using a multi-objective genetic algorithm. In: Proceedings of the 7th ACM/SPEC International Conference on Performance Engineering (ICPE), pp. 309–320 (2016)

  • Sridhara, G., Hill, E., Muppaneni, D., Pollock, L.L., Vijay-Shanker, K.: Towards automatically generating summary comments for java methods. In: Pecheur, C., Andrews, J., Nitto, E.D. (eds.) ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20–24, 2010, pp. 43–52. ACM (2010)

  • Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learn- ing with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec, Canada, pp. 3104–3112 (2014)

  • Vial, G.: Lessons in persisting object data using object-relational mapping. IEEE Softw. 36(6), 43–52 (2019)

    Article  Google Scholar 

  • Wang, B., Shin, R., Liu, X., Polozov, O., Richardson, M.: RAT-SQL: relation-aware schema encoding and linking for text-to-sql parsers. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020, pp. 7567–7578. Association for Computational Linguistics (2020)

  • Warren, D.H.D., Pereira, F.C.N.: An efficient easily adaptable system for interpreting natural language queries. Am. J. Comput. Linguist. 8(3–4), 110–122 (1982)

    Google Scholar 

  • Xu, X., Liu, C., Song, D.: Sqlnet: Generating structured queries from natural language without reinforcement learning. CoRR https://arxiv.org/abs/1711.04436 (2017)

  • Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.: Sqlizer: query synthesisfrom natural language. Proc. ACM Program. Lang. 1(OOPSLA), 63–16326 (2017)

    Article  Google Scholar 

  • Yu, T., Li, Z., Zhang, Z., Zhang, R., Radev, D.R.: Typesql: Knowledge-based type-aware neural text-to-sql generation. In: Walker, M.A., Ji, H., Stent, A. (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1–6, 2018, vol. 2 (Short Papers), pp. 588–594. Association for Computational Linguistics (2018a)

  • Yu, T., Yasunaga, M., Yang, K., Zhang, R., Wang, D., Li, Z., Radev, D.R.: Syntaxsqlnet: Syntax tree networks for complex and cross-domain text-to-sql task. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31–November 4, 2018, pp. 1653–1663. Association for Computational Linguistics (2018b)

  • Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., Zhang, Z., Radev, D.R.: Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pp. 3911–3921. Association for Computational Linguistics (2018c)

  • Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR abs/1709.00103 (2017)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61772200), Shanghai Natural Science Foundation (No. 21ZR1416300).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huiqun Yu.

Ethics declarations

Conflict of interest

The authors have no conflict of interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Z., Yu, H., Fan, G. et al. HQLgen: deep learning based HQL query generation from program context. Autom Softw Eng 29, 55 (2022). https://doi.org/10.1007/s10515-022-00359-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-022-00359-5

Keywords

Navigation