skip to main content
10.1145/3539618.3591860acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open Access

Improving Programming Q&A with Neural Generative Augmentation

Published:18 July 2023Publication History

ABSTRACT

Knowledge-intensive programming Q&A is an active research area in industry. Its application boosts developer productivity by aiding developers in quickly finding programming answers from the vast amount of information on the Internet. In this study, we propose ProQANS and its variants ReProQANS and ReAugProQANS to tackle programming Q&A. ProQANS is a neural search approach that leverages unlabeled data on the Internet (such as StackOverflow) to mitigate the cold-start problem. ReProQANS extends ProQANS by utilizing reformulated queries with a novel triplet loss. We further use an auxiliary generative model to augment the training queries, and design a novel dual triplet loss function to adapt these generated queries, to build another variant of ReProQANS termed as ReAugProQANS. In our empirical experiments, we show ReProQANS has the best performance when evaluated on the in-domain test set, while ReAugProQANS has the superior performance on the out-of-domain real programming questions, by outperforming the state-of-the-art model by up to 477% lift on the MRR metric respectively. The results suggest their robustness to previously unseen questions and its wide application to real programming questions.

Skip Supplemental Material Section

Supplemental Material

SIRIP23-sir1791.mp4

mp4

19.3 MB

References

  1. Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating Sequences from Structured Representations of Code. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  2. Negar Arabzadeh, Amin Bigdeli, Shirin Seyedsalehi, Morteza Zihayat, and Ebrahim Bagheri. 2021. Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 4417--4425. https://doi.org/10.1145/3459637.3482009Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Joel Brandt, Philip J. Guo, Joel Lewenstein, Mira Dontcheva, and Scott R. Klemmer. 2009. Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI '09). Association for Computing Machinery, New York, NY, USA, 1589--1598. https://doi.org/10.1145/1518701.1518944Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jia Chen, Jiaxin Mao, Yiqun Liu, Fan Zhang, Min Zhang, and Shaoping Ma. 2021. Towards a Better Understanding of Query Reformulation Behavior in Web Search. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia). Association for Computing Machinery, New York, NY, USA, 743--755. https://doi.org/10.1145/3442381.3450127Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. http://arxiv.org/abs/2203.03850 arXiv:2203.03850 [cs].Google ScholarGoogle Scholar
  6. Sonia Haiduc, Gabriele Bavota, Andrian Marcus, Rocco Oliveto, Andrea De Lucia, and Tim Menzies. 2013. Automatic query reformulations for text retrieval in software engineering. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, San Francisco, CA, USA, 842--851. https://doi.org/10.1109/ICSE.2013.6606630Google ScholarGoogle ScholarCross RefCross Ref
  7. Emily Hill, Lori Pollock, and K. Vijay-Shanker. 2011. Improving source code search with natural language phrasal representations of method signatures. In IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, Lawrence, KS, USA, 524--527. https://doi.org/10.1109/ASE.2011.6100115Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Junjie Huang, Duyu Tang, Linjun Shou, Ming Gong, Ke Xu, Daxin Jiang, Ming Zhou, and Nan Duan. 2021. CoSQA: 20,000 Web Queries for Code Search and Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 5690--5700. https://doi.org/10.18653/v1/2021.acl-long.442Google ScholarGoogle ScholarCross RefCross Ref
  9. Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 2073--2083. https://doi.org/10.18653/v1/P16--1195Google ScholarGoogle ScholarCross RefCross Ref
  10. Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph Gonzalez, and Ion Stoica. 2021. Contrastive Code Representation Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 5954--5971. https://doi.org/10.18653/v1/2021.emnlp-main.482Google ScholarGoogle ScholarCross RefCross Ref
  11. Jinqiu Yang and Lin Tan. 2012. Inferring semantically related words from software context. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). IEEE, Zurich, 161--170. https://doi.org/10.1109/MSR.2012.6224276Google ScholarGoogle ScholarCross RefCross Ref
  12. Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting Working Code Examples. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 664--675. https://doi.org/10.1145/2568225.2568292Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bosung Kim, Hyewon Choi, Haeun Yu, and Youngjoong Ko. 2021. Query Reformulation for Descriptive Queries of Jargon Words Using a Knowledge Graph Based on a Dictionary. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (Virtual Event, Queensland, Australia) (CIKM '21). Association for Computing Machinery, New York, NY, USA, 854--862. https://doi.org/10.1145/3459637.3482382Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. CodeHow: Effective Code Search Based on API Understanding and Extended Boolean Model (E). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, Lincoln, NE, USA, 260--270. https://doi.org/10.1109/ASE.2015.42Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rodrigo Nogueira, Jimmy Lin, and AI Epistemic. 2019a. From doc2query to docTTTTTquery. Online preprint, Vol. 6 (2019).Google ScholarGoogle Scholar
  16. Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019b. Document expansion by query prediction. arXiv preprint arXiv:1904.08375 (2019).Google ScholarGoogle Scholar
  17. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, Vol. 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.htmlGoogle ScholarGoogle Scholar
  18. Saksham Sachdev, Hongyu Li, Sifei Luan, Seohyun Kim, Koushik Sen, and Satish Chandra. 2018. Retrieval on source code: a neural code search. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages. ACM, Philadelphia PA USA, 31--41. https://doi.org/10.1145/3211346.3211353Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Sridhara, E. Hill, L. Pollock, and K. Vijay-Shanker. 2008. Identifying Word Relations in Software: A Comparative Study of Semantic Similarity Tools. In 2008 16th IEEE International Conference on Program Comprehension. IEEE, Amsterdam, 123--132. https://doi.org/10.1109/ICPC.2008.18Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Inc. Stack Exchange. 2022. Stack Exchange Data Dump. https://archive.org/details/stackexchangeGoogle ScholarGoogle Scholar
  21. Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E. Hassan, and Shanping Li. 2018. Measuring Program Comprehension: A Large-Scale Field Study with Professionals. IEEE Transactions on Software Engineering, Vol. 44, 10 (2018), 951--976. https://doi.org/10.1109/TSE.2017.2734091Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N. Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In International Conference on Learning Representations. https://openreview.net/forum?id=zeFrfgyZlnGoogle ScholarGoogle Scholar
  23. Daniel Zügner, Tobias Kirschstein, Michele Catasta, Jure Leskovec, and Stephan Günnemann. 2021. Language-Agnostic Representation Learning of Source Code from Structure and Context. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar

Index Terms

  1. Improving Programming Q&A with Neural Generative Augmentation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2023
      3567 pages
      ISBN:9781450394086
      DOI:10.1145/3539618

      Copyright © 2023 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 July 2023

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%
    • Article Metrics

      • Downloads (Last 12 months)153
      • Downloads (Last 6 weeks)12

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader