Abstract
Code search can recommend relevant source code according to the development intention (query statement) of the demander, thereby improving the efficiency of software development. In the research of deep code search model, code description is used to replace query sentences for training. However, the heterogeneity existing between the query statement and the code description will seriously affect the accuracy of the code search model. In order to make up for the shortcomings of code search, this paper proposes a sentence-integrated query expansion method—SIQE. Unlike previous query expansion methods that focus on word-level expansion, SIQE uses the entire code description fragment as the source of query expansion. And by learning the mapping relationship between the query statement and the code description, the heterogeneity problem between them is compensated. In order to verify the effect of the proposed model in code search tasks, the article conducts code search experiments and analyzes on two languages: python and java. Experimental results show that, compared with the baseline model, SIQE has higher code search results. Therefore, the SIQE model can effectively improve the search effect of query statements, improve the accuracy of code search, and further improve the development of software engineering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cambronero, J., Li, H., Kim, S., Sen, K., Chandra, S.: When deep learning met code search. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 964–974 (2019)
Chen, Q., Zhou, M.: A neural framework for retrieval and summarization of source code. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 826–831 (2018)
Cheng, Y., Kuang, L.: CSRS: code search with relevance matching and semantic matching. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 533–542 (2022)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)
Gu, X., Zhang, H., Kim, S.: Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, pp. 933–944 (2018)
Guo, D., et al.: GraphCodeBERT: pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)
Haiduc, S., Aponte, J., Marcus, A.: Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 2, pp. 223–226 (2010)
Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Lucia, A., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 842–851. IEEE (2013)
Huang, Q., Yang, Y., Cheng, M.: Deep learning the semantics of change sequences for query expansion, vol. 49, pp. 1600–1617. Wiley Online Library (2019)
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)
Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)
Le, T.-D.B., Wang, S., Lo, D.: Multi-abstraction concern localization. In: 2013 IEEE International Conference on Software Maintenance, pp. 364–367. IEEE (2013)
LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 184–195 (2020)
Ling, C., Lin, Z., Zou, Y., Xie, B.: Adaptive deep code search. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 48–59 (2020)
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18(2), 300–336 (2009)
Liu, J., Kim, S., Murali, V., Chaudhuri, S., Chandra, S.: Neural query expansion for code search. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp. 29–37 (2019)
Liu, S., Xie, X., Siow, J., Ma, L., Meng, G., Liu, Y.: GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search. IEEE Trans. Software Eng. 49, 2839–2855 (2023)
Lu, M., Sun, X., Wang, S., Lo, D., Duan, Y.: Query expansion via wordnet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549. IEEE (2015)
Lv, F., Zhang, H., Lou, J., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270. IEEE (2015)
McCardle, P., Cooper, J.A., Houle, G.R., Karp, N., Paul-Brown, D.: Emergent and early literacy: current status and research directions-introduction. Learn. Disabil. Res. Pract. 16(4), 183–185 (2001)
McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 111–120 (2011)
Nguyen, A.T., Nguyen, T.T., Al-Kofahi, J., Nguyen, H.V., Nguyen, T.N.: A topic-based approach for narrowing the search space of buggy files from a bug report. In: 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pp. 263–272. IEEE (2011)
Nie, L., Jiang, H., Ren, Z., Sun, Z., Li, X.: Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9(5), 771–783 (2016)
Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., Liu, D.: Source code exploration with google. In: 2006 22nd IEEE International Conference on Software Maintenance, pp. 334–338. IEEE (2006)
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Shuai, J., Xu, L., Liu, C., Yan, M., Xia, X., Lei, Y.: Improving code search with co-attentive representation learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 196–207 (2020)
Tian, Y., Lo, D., Lawall, J.: Automated construction of a software-specific word similarity database. In: 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), pp. 44–53. IEEE (2014)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wang, C., et al.: Enriching query semantics for code search with reinforcement learning. Neural Netw. 145, 22–32 (2022)
Wang, W., Li, G., Shen, S., Xia, X., Jin, Z.: Modular tree network for source code representation learning. ACM Trans. Softw. Eng. Methodol. (TOSEM) 29(4), 1–23 (2020)
Xu, L., et al.: Two-stage attention-based model for code search with textual and structural features. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 342–353. IEEE (2021)
Yao, Z., Peddamail, J.R., Sun, H.: CoaCor: code annotation for code retrieval with reinforcement learning. In: The World Wide Web Conference, pp. 2203–2214 (2019)
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)
Acknowledgement
This work is supported by the National Natural Science Foundation of China [Grant No. 61872139] and the Research Project of Hunan Provincial Education Department [Grant No. 22C0600].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Liu, X., Liu, J., Hu, H., Liu, Y. (2024). Enrich Code Search Query Semantics with Raw Descriptions. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-54521-4_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-54521-4_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54520-7
Online ISBN: 978-3-031-54521-4
eBook Packages: Computer ScienceComputer Science (R0)