skip to main content
10.1145/3632754.3633480acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
extended-abstract

Efficiency of Large Language Models to scale up Ground Truth: Overview of the IRSE Track at Forum for Information Retrieval 2023

Published: 12 February 2024 Publication History

Abstract

The Software Engineering Information Retrieval (IRSE) track aims to devise solutions for the automated evaluation of code comments within a machine learning framework, with labels generated by both humans and large language models. Within this track, there is a binary classification task: discerning comments as either useful or not useful. The dataset includes 9,048 pairs of code comments and surrounding code snippets drawn from open-source C-based projects on GitHub and an additional dataset generated by teams employing large language models. In total, 17 teams representing various universities and software companies have contributed 56 experiments. These experiments were assessed through quantitative metrics, primarily the F1-Score, and qualitative evaluations based on the features developed, the supervised learning models employed, and their respective hyperparameters. It is worth noting that labels generated by large language models introduce bias into the prediction model but lead to less over-fitted results.

References

[1]
Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of useful code reviews: An empirical study at microsoft(Working Conference on Mining Software Repositories). IEEE, 146–156.
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[3]
Nachiketa Chatterjee, Srijoni Majumdar, Shila Rani Sahoo, and Partha Pratim Das. 2015. Debugging multi-threaded applications using pin-augmented gdb (pgdb). In International conference on software engineering research and practice (SERP). Springer. 109–115.
[4]
Srijoni Majumdar, Ayan Bandyopadhyay, Samiran Chattopadhyay, Partha Pratim Das, Paul D Clough, and Prasenjit Majumder. 2022. Overview of the IRSE track at FIRE 2022: Information Retrieval in Software Engineering. In Forum for Information Retrieval Evaluation, ACM.
[5]
Srijoni Majumdar, Ayan Bandyopadhyay, Partha Pratim Das, Paul Clough, Samiran Chattopadhyay, and Prasenjit Majumder. 2022. Can we predict useful comments in source codes?-Analysis of findings from Information Retrieval in Software Engineering Track@ FIRE 2022. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation. 15–17.
[6]
Srijoni Majumdar, Ayush Bansal, Partha Pratim Das, Paul D Clough, Kausik Datta, and Soumya Kanti Ghosh. 2022. Automated evaluation of comments to aid software maintenance. Journal of Software: Evolution and Process 34, 7 (2022), e2463.
[7]
Srijoni Majumdar, Nachiketa Chatterjee, Partha Pratim Das, and Amlan Chakrabarti. 2021. A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers. Innovations in Systems and Software Engineering 17, 3 (2021), 289–307.
[8]
Srijoni Majumdar, Nachiketa Chatterjee, Partha Pratim Das, and Amlan Chakrabarti. 2021. Dcube_ NN D cube NN: Tool for Dynamic Design Discovery from Multi-threaded Applications Using Neural Sequence Models. Advanced Computing and Systems for Security: Volume 14 (2021), 75–92.
[9]
Srijoni Majumdar, Nachiketa Chatterjee, Shila Rani Sahoo, and Partha Pratim Das. 2016. D-cube: tool for dynamic design discovery from multi-threaded applications using pin. In 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS). IEEE, 25–32.
[10]
Srijoni Majumdar, Shakti Papdeja, Partha Pratim Das, and Soumya Kanti Ghosh. 2019. Smartkt: a search framework to assist program comprehension using smart knowledge transfer. In 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS). IEEE, 97–108.
[11]
Srijoni Majumdar, Shakti Papdeja, Partha Pratim Das, and Soumya Kanti Ghosh. 2020. Comment-Mine - A Semantic Search Approach to Program Comprehension from Code Comments. In Advanced Computing and Systems for Security. Springer, 29–42.
[12]
Srijoni Majumdar, Ashutosh Varshney, Partha Pratim Das, Paul D Clough, and Samiran Chattopadhyay. 2022. An Effective Low-Dimensional Software Code Representation using BERT and ELMo. In 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS). IEEE, 763–774.
[13]
Michael P O’brien. 2003. Software comprehension–a review and research direction. Technical Report Technical Report. Department of Computer Science & Information Systems University of Limerick, Ireland.
[14]
Mohammad Masudur Rahman, Chanchal K Roy, and Raula G Kula. 2017. Predicting usefulness of code review comments using textual features and developer experience(International Conference on Mining Software Repositories (MSR)). IEEE, 215–226.
[15]
Daniela Steidl, Benjamin Hummel, and Elmar Juergens. 2013. Quality analysis of source code comments(International Conference on Program Comprehension (ICPC)). IEEE, 83–92.

Index Terms

  1. Efficiency of Large Language Models to scale up Ground Truth: Overview of the IRSE Track at Forum for Information Retrieval 2023
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Other conferences
            FIRE '23: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation
            December 2023
            170 pages
            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 12 February 2024

            Check for updates

            Author Tags

            1. Abstract syntax tree
            2. Bert
            3. GPT-2
            4. Neural networks
            5. Stanford POS Tagging

            Qualifiers

            • Extended-abstract
            • Research
            • Refereed limited

            Conference

            FIRE 2023

            Acceptance Rates

            Overall Acceptance Rate 19 of 64 submissions, 30%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 41
              Total Downloads
            • Downloads (Last 12 months)38
            • Downloads (Last 6 weeks)6
            Reflects downloads up to 05 Mar 2025

            Other Metrics

            Citations

            View Options

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media