Skip to main content

Enrich Code Search Query Semantics with Raw Descriptions

  • Conference paper
  • First Online:
Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2023)

Abstract

Code search can recommend relevant source code according to the development intention (query statement) of the demander, thereby improving the efficiency of software development. In the research of deep code search model, code description is used to replace query sentences for training. However, the heterogeneity existing between the query statement and the code description will seriously affect the accuracy of the code search model. In order to make up for the shortcomings of code search, this paper proposes a sentence-integrated query expansion method—SIQE. Unlike previous query expansion methods that focus on word-level expansion, SIQE uses the entire code description fragment as the source of query expansion. And by learning the mapping relationship between the query statement and the code description, the heterogeneity problem between them is compensated. In order to verify the effect of the proposed model in code search tasks, the article conducts code search experiments and analyzes on two languages: python and java. Experimental results show that, compared with the baseline model, SIQE has higher code search results. Therefore, the SIQE model can effectively improve the search effect of query statements, improve the accuracy of code search, and further improve the development of software engineering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cambronero, J., Li, H., Kim, S., Sen, K., Chandra, S.: When deep learning met code search. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 964–974 (2019)

    Google Scholar 

  2. Chen, Q., Zhou, M.: A neural framework for retrieval and summarization of source code. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 826–831 (2018)

    Google Scholar 

  3. Cheng, Y., Kuang, L.: CSRS: code search with relevance matching and semantic matching. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 533–542 (2022)

    Google Scholar 

  4. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  5. Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)

  6. Gu, X., Zhang, H., Kim, S.: Deep code search. In: Proceedings of the 40th International Conference on Software Engineering, pp. 933–944 (2018)

    Google Scholar 

  7. Guo, D., et al.: GraphCodeBERT: pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)

  8. Haiduc, S., Aponte, J., Marcus, A.: Supporting program comprehension with source code summarization. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 2, pp. 223–226 (2010)

    Google Scholar 

  9. Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Lucia, A., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 842–851. IEEE (2013)

    Google Scholar 

  10. Huang, Q., Yang, Y., Cheng, M.: Deep learning the semantics of change sequences for query expansion, vol. 49, pp. 1600–1617. Wiley Online Library (2019)

    Google Scholar 

  11. Husain, H., Wu, H.-H., Gazit, T., Allamanis, M., Brockschmidt, M.: CodeSearchNet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)

  12. Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)

    Article  Google Scholar 

  13. Le, T.-D.B., Wang, S., Lo, D.: Multi-abstraction concern localization. In: 2013 IEEE International Conference on Software Maintenance, pp. 364–367. IEEE (2013)

    Google Scholar 

  14. LeClair, A., Haque, S., Wu, L., McMillan, C.: Improved code summarization via a graph neural network. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 184–195 (2020)

    Google Scholar 

  15. Ling, C., Lin, Z., Zou, Y., Xie, B.: Adaptive deep code search. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 48–59 (2020)

    Google Scholar 

  16. Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., Baldi, P.: Sourcerer: mining and searching internet-scale software repositories. Data Min. Knowl. Disc. 18(2), 300–336 (2009)

    Article  MathSciNet  Google Scholar 

  17. Liu, J., Kim, S., Murali, V., Chaudhuri, S., Chandra, S.: Neural query expansion for code search. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pp. 29–37 (2019)

    Google Scholar 

  18. Liu, S., Xie, X., Siow, J., Ma, L., Meng, G., Liu, Y.: GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search. IEEE Trans. Software Eng. 49, 2839–2855 (2023)

    Article  Google Scholar 

  19. Lu, M., Sun, X., Wang, S., Lo, D., Duan, Y.: Query expansion via wordnet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549. IEEE (2015)

    Google Scholar 

  20. Lv, F., Zhang, H., Lou, J., Wang, S., Zhang, D., Zhao, J.: CodeHow: effective code search based on API understanding and extended boolean model (E). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 260–270. IEEE (2015)

    Google Scholar 

  21. McCardle, P., Cooper, J.A., Houle, G.R., Karp, N., Paul-Brown, D.: Emergent and early literacy: current status and research directions-introduction. Learn. Disabil. Res. Pract. 16(4), 183–185 (2001)

    Article  Google Scholar 

  22. McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., Fu, C.: Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd International Conference on Software Engineering, pp. 111–120 (2011)

    Google Scholar 

  23. Nguyen, A.T., Nguyen, T.T., Al-Kofahi, J., Nguyen, H.V., Nguyen, T.N.: A topic-based approach for narrowing the search space of buggy files from a bug report. In: 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pp. 263–272. IEEE (2011)

    Google Scholar 

  24. Nie, L., Jiang, H., Ren, Z., Sun, Z., Li, X.: Query expansion based on crowd knowledge for code search. IEEE Trans. Serv. Comput. 9(5), 771–783 (2016)

    Article  Google Scholar 

  25. Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., Liu, D.: Source code exploration with google. In: 2006 22nd IEEE International Conference on Software Maintenance, pp. 334–338. IEEE (2006)

    Google Scholar 

  26. Radford, A., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  27. Shuai, J., Xu, L., Liu, C., Yan, M., Xia, X., Lei, Y.: Improving code search with co-attentive representation learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 196–207 (2020)

    Google Scholar 

  28. Tian, Y., Lo, D., Lawall, J.: Automated construction of a software-specific word similarity database. In: 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), pp. 44–53. IEEE (2014)

    Google Scholar 

  29. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  30. Wang, C., et al.: Enriching query semantics for code search with reinforcement learning. Neural Netw. 145, 22–32 (2022)

    Article  Google Scholar 

  31. Wang, W., Li, G., Shen, S., Xia, X., Jin, Z.: Modular tree network for source code representation learning. ACM Trans. Softw. Eng. Methodol. (TOSEM) 29(4), 1–23 (2020)

    Google Scholar 

  32. Xu, L., et al.: Two-stage attention-based model for code search with textual and structural features. In: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 342–353. IEEE (2021)

    Google Scholar 

  33. Yao, Z., Peddamail, J.R., Sun, H.: CoaCor: code annotation for code retrieval with reinforcement learning. In: The World Wide Web Conference, pp. 2203–2214 (2019)

    Google Scholar 

  34. Zhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X.: A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794. IEEE (2019)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China [Grant No. 61872139] and the Research Project of Hunan Provincial Education Department [Grant No. 22C0600].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxun Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Liu, J., Hu, H., Liu, Y. (2024). Enrich Code Search Query Semantics with Raw Descriptions. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-54521-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54521-4_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54520-7

  • Online ISBN: 978-3-031-54521-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics