Enrich Code Search Query Semantics with Raw Descriptions

Liu, Xiangzheng; Liu, Jianxun; Hu, Haize; Liu, Yi

doi:10.1007/978-3-031-54521-4_16

Xiangzheng Liu^18,19,
Jianxun Liu^18,19,
Haize Hu^18,19 &
…
Yi Liu^18,19

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 561))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

104 Accesses

Abstract

Code search can recommend relevant source code according to the development intention (query statement) of the demander, thereby improving the efficiency of software development. In the research of deep code search model, code description is used to replace query sentences for training. However, the heterogeneity existing between the query statement and the code description will seriously affect the accuracy of the code search model. In order to make up for the shortcomings of code search, this paper proposes a sentence-integrated query expansion method—SIQE. Unlike previous query expansion methods that focus on word-level expansion, SIQE uses the entire code description fragment as the source of query expansion. And by learning the mapping relationship between the query statement and the code description, the heterogeneity problem between them is compensated. In order to verify the effect of the proposed model in code search tasks, the article conducts code search experiments and analyzes on two languages: python and java. Experimental results show that, compared with the baseline model, SIQE has higher code search results. Therefore, the SIQE model can effectively improve the search effect of query statements, improve the accuracy of code search, and further improve the development of software engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cambronero, J., Li, H., Kim, S., Sen, K., Chandra, S.: When deep learning met code search. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 964–974 (2019)
Google Scholar
Chen, Q., Zhou, M.: A neural framework for retrieval and summarization of source code. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 826–831 (2018)
Google Scholar
Cheng, Y., Kuang, L.: CSRS: code search with relevance matching and semantic matching. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 533–542 (2022)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)
Gu, X., Zhang, H., Kim, S.: Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, pp. 933–944 (2018)
Google Scholar
Guo, D., et al.: GraphCodeBERT: pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)
Haiduc, S., Aponte, J., Marcus, A.: Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 2, pp. 223–226 (2010)
Google Scholar
Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Lucia, A., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 842–851. IEEE (2013)
Google Scholar
Huang, Q., Yang, Y., Cheng, M.: Deep learning the semantics of change sequences for query expansion, vol. 49, pp. 1600–1617. Wiley Online Library (2019)
Google Scholar
Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)
Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)
Article Google Scholar
Le, T.-D.B., Wang, S., Lo, D.: Multi-abstraction concern localization. In: 2013 IEEE International Conference on Software Maintenance, pp. 364–367. IEEE (2013)
Google Scholar
LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 184–195 (2020)
Google Scholar
Ling, C., Lin, Z., Zou, Y., Xie, B.: Adaptive deep code search. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 48–59 (2020)
Google Scholar
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18(2), 300–336 (2009)
Article MathSciNet Google Scholar
Liu, J., Kim, S., Murali, V., Chaudhuri, S., Chandra, S.: Neural query expansion for code search. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp. 29–37 (2019)
Google Scholar
Liu, S., Xie, X., Siow, J., Ma, L., Meng, G., Liu, Y.: GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search. IEEE Trans. Software Eng. 49, 2839–2855 (2023)
Article Google Scholar
Lu, M., Sun, X., Wang, S., Lo, D., Duan, Y.: Query expansion via wordnet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549. IEEE (2015)
Google Scholar
Lv, F., Zhang, H., Lou, J., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270. IEEE (2015)
Google Scholar
McCardle, P., Cooper, J.A., Houle, G.R., Karp, N., Paul-Brown, D.: Emergent and early literacy: current status and research directions-introduction. Learn. Disabil. Res. Pract. 16(4), 183–185 (2001)
Article Google Scholar
McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 111–120 (2011)
Google Scholar
Nguyen, A.T., Nguyen, T.T., Al-Kofahi, J., Nguyen, H.V., Nguyen, T.N.: A topic-based approach for narrowing the search space of buggy files from a bug report. In: 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pp. 263–272. IEEE (2011)
Google Scholar
Nie, L., Jiang, H., Ren, Z., Sun, Z., Li, X.: Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9(5), 771–783 (2016)
Article Google Scholar
Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., Liu, D.: Source code exploration with google. In: 2006 22nd IEEE International Conference on Software Maintenance, pp. 334–338. IEEE (2006)
Google Scholar
Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Shuai, J., Xu, L., Liu, C., Yan, M., Xia, X., Lei, Y.: Improving code search with co-attentive representation learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 196–207 (2020)
Google Scholar
Tian, Y., Lo, D., Lawall, J.: Automated construction of a software-specific word similarity database. In: 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), pp. 44–53. IEEE (2014)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, C., et al.: Enriching query semantics for code search with reinforcement learning. Neural Netw. 145, 22–32 (2022)
Article Google Scholar
Wang, W., Li, G., Shen, S., Xia, X., Jin, Z.: Modular tree network for source code representation learning. ACM Trans. Softw. Eng. Methodol. (TOSEM) 29(4), 1–23 (2020)
Google Scholar
Xu, L., et al.: Two-stage attention-based model for code search with textual and structural features. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 342–353. IEEE (2021)
Google Scholar
Yao, Z., Peddamail, J.R., Sun, H.: CoaCor: code annotation for code retrieval with reinforcement learning. In: The World Wide Web Conference, pp. 2203–2214 (2019)
Google Scholar
Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China [Grant No. 61872139] and the Research Project of Hunan Provincial Education Department [Grant No. 22C0600].

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, Hunan, China
Xiangzheng Liu, Jianxun Liu, Haize Hu & Yi Liu
Hunan Provincial Key Laboratory for Services Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan, Hunan, China
Xiangzheng Liu, Jianxun Liu, Haize Hu & Yi Liu

Authors

Xiangzheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianxun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haize Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianxun Liu .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool, Suzhou, China
Xinheng Wang
University of Peloponnese, Patra, Greece
Nikolaos Voros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Liu, J., Hu, H., Liu, Y. (2024). Enrich Code Search Query Semantics with Raw Descriptions. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-54521-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-54521-4_16
Published: 23 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54520-7
Online ISBN: 978-3-031-54521-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enrich Code Search Query Semantics with Raw Descriptions