Skip to main content
Log in

FuEPRe: a fusing embedding method with attention for post recommendation

  • Original Research Paper
  • Published:
Service Oriented Computing and Applications Aims and scope Submit manuscript

Abstract

Post recommendations refer to finding solutions related to a user’s problem on QA websites to help them solve their problems. However, finding the most relevant post from a large number of posts related to a problem is a challenging task. This paper proposes a novel recommendation model called FuEPRe, which based on a multi-headed self-attention network integrates semantic information, structural information of code and description information. It accurately recommends relevant Stack Overflow posts based on users’ queries, thereby helping them solve problems quickly and solving the problem of inaccurate post recommendations in the past. Each pair of codes and descriptions is represented as two vectors, and then, the three different types of information are fused into these two vectors through an attention mechanism. At this point, each vector contains the above three types of information and then recommends posts by comparing the similarity between the vectors. The proposed approach is evaluated on the Stack Overflow Posts dataset, and the results demonstrate that it outperforms some state-of-the-art methods in the post recommendation task. Specifically, the approach improves the recall, MRR, and NDCG of recommendations, enabling programmers to solve problems faster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://archive.org/details/stackexchange.

  2. https://huggingface.co/Salesforce/codet5-base.

  3. https://github.com/microsoft/CodeBERT.

  4. https://github.com/microsoft/CodeBERT/tree/master/GraphCodeBERT.

  5. https://huggingface.co/microsoft/codebert-base.

  6. https://github.com/github/CodeSearchNet.

References

  1. Yang D, Hussain A, Lopes CV (2016) From query to usable code: an analysis of stack overflow code snippets. In: Proceedings of the 13th international conference on mining software repositories, MSR ’16. Association for Computing Machinery, New York, pp 391–402. https://doi.org/10.1145/2901739.2901767

  2. Horton E, Parnin C (2018) Gistable: evaluating the executability of python code snippets on github. In: 2018 IEEE international conference on software maintenance and evolution (ICSME), pp 217–227. https://doi.org/10.1109/ICSME.2018.00031

  3. Chan W-K, Cheng H, Lo D (2012) Searching connected api subgraph via text phrases. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE ’12. Association for Computing Machinery, New York. https://doi.org/10.1145/2393596.2393606

  4. Hill E, Roldan-Vega M, Fails JA, Mallet G (204) Nl-based query refinement and contextualized code search results: a user study. In: 2014 software evolution week—IEEE conference on software maintenance, reengineering, and reverse engineering (CSMR-WCRE)

  5. Holmes R, Cottrell R, Walker RJ, Denzinger J (2009) The end-to-end use of source code examples: an exploratory study. In: 2009 IEEE international conference on software maintenance, pp 555–558. https://doi.org/10.1109/ICSM.2009.530638

  6. McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11. Association for Computing Machinery, New York, pp 111–120. https://doi.org/10.1145/1985793.1985809

  7. Raghothaman M, Wei Y, Hamadi Y (2016) Swim: Synthesizing what i mean: code search and idiomatic snippet synthesis. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. Association for Computing Machinery, New York, pp 357–367. https://doi.org/10.1145/2884781.2884808

  8. Gu X, Zhang H, Kim S (2018) Deep code search. In Proceedings of the 40th international conference on software engineering, ICSE ’18. Association for Computing Machinery, New York, pp 933–944. https://doi.org/10.1145/3180155.3180167

  9. Fang S, Tan Y-S, Zhang T, Liu Y (2021) Self-attention networks for code search. Inf Softw Technol 134:106542. https://doi.org/10.1016/j.infsof.2021.106542

    Article  Google Scholar 

  10. Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement CB, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) Graphcodebert: pre-training code representations with data flow. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net

  11. Gao S, Gao C, He Y, Zeng J, Nie L, Xia X, Lyu M (2023) Code structure-guided transformer for source code summarization. ACM Trans Softw Eng Method 32(1):1–32. https://doi.org/10.1145/3522674

    Article  Google Scholar 

  12. Shi E, Wang Y, Lun D, Zhang H, Han S, Zhang D, Sun H (2023) Cocoast: representing source code via hierarchical splitting and reconstruction of abstract syntax trees. Empir Softw Eng 28(6):1–41. https://doi.org/10.1007/s10664-023-10378-9

    Article  Google Scholar 

  13. Mahajan S, Abolhassani N, Prasad MR (2020) Recommending stack overflow posts for fixing runtime exceptions using failure scenario matching. In: Devanbu P, Cohen MB, Zimmermann T (eds) ESEC/FSE ’20: 28th ACM joint European software engineering conference and symposium on the foundations of software engineering, Virtual Event, USA, November 8–13, 2020. ACM, pp 1052–1064. https://doi.org/10.1145/3368089.3409764

  14. Greco C, Haden T, Damevski K (2018) Stackintheflow: behavior-driven recommendation system for stack overflow posts. In: Chaudron Mi, Crnkovic I, Chechik M, Harman M (eds) Proceedings of the 40th international conference on software engineering: companion proceeedings, ICSE 2018, Gothenburg, Sweden, May 27–June 03, 2018. ACM, pp 5–8. https://doi.org/10.1145/3183440.3183477

  15. Rubei R, Di Sipio C, Nguyen PT, Di Rocco J, Di Ruscio D (2020) Postfinder: mining stack overflow posts to support software developers. Inf Softw Technol 127:106367. https://doi.org/10.1016/j.infsof.2020.106367

    Article  Google Scholar 

  16. Irsan IC, Zhang T, Thung F, Kim K, Lo D (2023) Picaso: enhancing api recommendations with relevant stack overflow posts. https://doi.org/10.1109/MSR59073.2023.00025

  17. Di W, Jing X-Y, Zhang H, Zhou Y, Baowen X (2023) Leveraging stack overflow to detect relevant tutorial fragments of apis. Empir Softw Eng 28(1):12. https://doi.org/10.1007/s10664-022-10235-1

    Article  Google Scholar 

  18. Chen J, Kaushal KK, Kulkarni R, Meng N (2023) How do java developers reuse stackoverflow answers in their github projects? CoRR: arXiv:2308.09573

  19. Bowen X, Hoang T, Sharma A, Yang C, Xia X, Lo D (2022) Post2vec: learning distributed representations of stack overflow posts. IEEE Trans Softw Eng 48(9):3423–3441. https://doi.org/10.1109/TSE.2021.3093761

    Article  Google Scholar 

  20. He J, Xu B, Yang Z, Han D, Yang C, Lo D (2022) Ptm4tag: sharpening tag recommendation of stack overflow posts with pre-trained models. In: Rastogi A, Tufano R, Bavota G, Arnaoudova V, Haiduc S (eds) Proceedings of the 30th IEEE/ACM international conference on program comprehension, ICPC 2022, Virtual Event, May 16–17, 2022. AC, pp 1–11. https://doi.org/10.1145/3524610.3527897

  21. Haldar R, Wu L, Xiong J, Hockenmaier J (2020) A multi-perspective architecture for semantic code search. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 8563–8568. https://doi.org/10.18653/v1/2020.acl-main.758

  22. Shuai J, Xu L, Liu C, Yan M, Xia X, Lei Y (2020) Improving code search with co-attentive representation learning. In: Proceedings of the 28th international conference on program comprehension, ICPC ’20, NY, USA. Association for Computing Machinery, New York, pp 196–207. https://doi.org/10.1145/3387904.3389269

  23. Shi E, Wang Y, Gu W, Du L, Zhang H, Han S, Zhang D, Sun H (2023) Cocosoda: effective contrastive learning for code search. In: 2023 IEEE/ACM 45th international conference on software engineering (ICSE). IEEE, pp 2198–2210. https://doi.org/10.1109/ICSE48619.2023.00185

  24. Zeng C, Yu Y, Li S, Xia X, Wang Z, Geng M, Bai L, Dong W, Liao X (2023) degraphcs: embedding variable-based flow graph for neural code search. ACM Trans Softw Eng Methodol 32(2):34:1-34:27. https://doi.org/10.1145/354606

    Article  Google Scholar 

  25. Wang C, Nong Z, Gao C, Li Z, Zeng J, Xing Z, Liu Y (2022) Enriching query semantics for code search with reinforcement learning. Neural Netw 145:22–32. https://doi.org/10.1016/j.neunet.2021.09.025

    Article  PubMed  Google Scholar 

  26. Liu S, Xie X, Siow JK, Ma L, Meng G, Liu Y (2023) Graphsearchnet: enhancing gnns via capturing global dependencies for semantic code search. IEEE Trans Softw Eng 49(4):2839–2855. https://doi.org/10.1109/TSE.2022.3233901

    Article  Google Scholar 

  27. Liu C, Xia X, Lo D, Liu Z, Hassan AE, Li S (2022) Codematcher: searching code based on sequential semantics of important query words. ACM Trans Softw Eng Methodol 31(1):12:1-12:37. https://doi.org/10.1145/3465403

    Article  Google Scholar 

  28. Yao Z, Peddamail JR, Sun H (2019) Coacor: Code annotation for code retrieval with reinforcement learning. In: The world wide web conference, WWW ’19. Association for Computing Machinery, New York, pp 2203–2214. https://doi.org/10.1145/3308558.3313632

  29. Al Ishtiaq A, Hasan M, Haque Md.MA, Mehrab KS, Muttaqueen T, Hasan T, Iqbal A, Shahriyar R (2021) Bert2code: can pretrained language models be leveraged for code search? CoRR: arXiv:2104.08017

  30. Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) Codebert: a pre-trained model for programming and natural languages. In: Cohn T, He Y, Liu Y (eds) Findings of the association for computational linguistics: EMNLP 2020, Online Event, 16–20 November 2020, volume EMNLP 2020 of Findings of ACL. Association for Computational Linguistics, pp 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139

  31. Wang Y, Wang W, Joty SR, Hoi SCH (2021) Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens M-F, Huang X, Specia L, Yih SW-t (eds) Proceedings of the 2021 conference on empirical methods in natural language processing, EMNLP 2021, virtual event/punta cana, dominican republic, 7–11 November, 2021. Association for Computational Linguistics, pp 8696–8708. https://doi.org/10.18653/v1/2021.emnlp-main.685

  32. Reiss SP (2009) Semantics-based code search. In: 31st International conference on software engineering, ICSE 2009, May 16–24, 2009, Vancouver, Canada, proceedings. IEEE, pp 243–253https://doi.org/10.1109/ICSE.2009.5070525

  33. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the north american chapter of the Association for Computational Linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, volume 1 (long and short papers). Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423

  34. Adnan M, Alarood AAS, Uddin MI, Ur Rehman I (2022) Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models. PeerJ Comput Sci 8:e803. https://doi.org/10.7717/peerj-cs.803

    Article  PubMed  PubMed Central  Google Scholar 

  35. Aziz F, Gul H, Uddin I, Gkoutos GV (2020) Path-based extensions of local link prediction methods for complex networks. Sci Rep 10(1):19848. https://doi.org/10.1038/s41598-020-76860-2

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wenchao G, Li Z, Gao C, Wang C, Zhang H, Zenglin X, Lyu MR (2021) Cradle: deep code retrieval based on semantic dependency learning. Neural Netw 141:385–394. https://doi.org/10.1016/j.neunet.2021.04.019

    Article  Google Scholar 

  37. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, New York

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. U2241216) and the Opening Fund of Key Laboratory of Civil Aviation Emergency Science and Technology (CAAC) (No. NJ2022022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guohua Shen.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Shen, G., Huang, Z. et al. FuEPRe: a fusing embedding method with attention for post recommendation. SOCA 18, 67–79 (2024). https://doi.org/10.1007/s11761-024-00386-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11761-024-00386-y

Keywords

Navigation