research-article

Evaluating the evaluations of code recommender systems: a reality check

Authors:

Sebastian Proksch,

Mira MeziniAuthors Info & Claims

ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

Pages 111 - 121

https://doi.org/10.1145/2970276.2970330

Published: 25 August 2016 Publication History

Abstract

While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet completion, or code search, an accurate evaluation of such systems is always a challenge. We analyzed the current literature and found that most of the current evaluations rely on artificial queries extracted from released code, which begs the question: Do such evaluations reflect real-life usages? To answer this question, we capture 6,189 fine-grained development histories from real IDE interactions. We use them as a ground truth and extract 7,157 real queries for a specific method-call recommender system. We compare the results of such real queries with different artificial evaluation strategies and check several assumptions that are repeatedly used in research, but never empirically evaluated. We find that an evolving context that is often observed in practice has a major effect on the prediction quality of recommender systems, but is not commonly reflected in artificial evaluations.

References

[1]

S. Amann, S. Proksch, and S. Nadi. FeedBaG: An Interaction Tracker for Visual Studio. In Proceedings of the 24th International Conference on Program Comprehension Tool Track, 2016.

[2]

M. Bruch, M. Monperrus, and M. Mezini. Learning from Examples to Improve Code Completion Systems. In Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, 2009.

Digital Library

[3]

R. DeLine, M. Czerwinski, and G. Robertson. Easing Program Comprehension by Sharing Navigation Data. In 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’05), 2005.

Digital Library

[4]

Evaluating the Evaluations of Code Recommender Systems: A Reality Check – Online Artifact. http://www.st.informatik.tudarmstadt.de/artifacts/evaleval/.

[5]

T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete Completion Using Types and Weights. In Proceedings of the 34th Conference on Programming Language Design and Implementation. ACM, 2013.

Digital Library

[6]

A. E. Hassan and R. C. Holt. Replaying Development History to Assess the Effectiveness of Change Propagation Tools. Empirical Software Engineering, 11(3), 2006.

Digital Library

[7]

L. Heinemann, V. Bauer, M. Herrmannsdoerfer, and B. Hummel. Identifier-based Context-dependent API Method Recommendation. In Proceedings of the 16th European Conference on Software Maintenance and Reengineering. IEEE, 2012.

Digital Library

[8]

A. Hindle, E. Barr, Z. Su, M. Gabel, and P. Devanbu. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012.

Digital Library

[9]

R. Holmes and G. C. Murphy. Using Structural Context to Recommend Source Code Examples. In Proceedings of the 27th International Conference on Software Engineering. ACM, 2005.

Digital Library

[10]

M. Kersten and G. C. Murphy. Mylar: A Degree-of-interest Model for IDEs. In Proceedings of the 4th International Conference on Aspect-oriented Software Development, AOSD ’05. ACM, 2005.

Digital Library

[11]

M. Kersten and G. C. Murphy. Using Task Context to Improve Programmer Productivity. In Proceedings of the 14th International Symposium on Foundations of Software Engineering, SIGSOFT’06/ FSE-14. ACM, 2006.

Digital Library

[12]

C. Kolassa, D. Riehle, and M. A. Salim. The Empirical Commit Frequency Distribution of Open Source Projects. In Proceedings of the 9th International Symposium on Open Collaboration. ACM, 2013.

Digital Library

[13]

D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman. Jungloid Mining: Helping to Navigate the API Jungle. In Proceedings of the 26th Conference on Programming Language Design and Implementation. ACM, 2005.

Digital Library

[14]

R. Minelli, A. Mocci, and M. Lanza. The Plague Doctor: A Promising Cure for the Window Plague. In Proceedings of the 23rd International Conference on Program Comprehension. IEEE, 2015.

Digital Library

[15]

L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How Can I Use This Method? In Proceedings of the 37th International Conference on Software Engineering. IEEE, 2015.

Digital Library

[16]

S. Negara, M. Codoban, D. Dig, and R. E. Johnson. Mining Fine-grained Code Changes to Detect Unknown Change Patterns. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014. ACM, 2014.

Digital Library

[17]

S. Negara, M. Vakilian, N. Chen, R. E. Johnson, and D. Dig. Proceedings of the 26th European Conference on Object-Oriented Programming, chapter Is It Dangerous to Use Version Control Histories to Study Source Code Evolution? Springer Berlin Heidelberg, 2012.

Digital Library

[18]

A. T. Nguyen, H. A. Nguyen, T. T. Nguyen, and T. N. Nguyen. Grapacc: A Graph-based Pattern-oriented, Context-sensitive Code Completion Tool. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012.

Digital Library

[19]

A. T. Nguyen and T. N. Nguyen. Graph-based Statistical Language Model for Code. In Proceedings of the 37th International Conference on Software Engineering. IEEE, 2015.

Digital Library

[20]

L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, and M. Lanza. Mining StackOverflow to Turn the IDE into a Self-confident Programming Prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories, 2014.

Digital Library

[21]

S. Proksch, S. Amann, S. Nadi, and M. Mezini. A Dataset of Simplified Syntax Trees for C#. In Proceedings of the 13th International Conference on Mining Software Repositories, Data Showcase, 2016.

Digital Library

[22]

S. Proksch, J. Lerch, and M. Mezini. Intelligent Code Completion with Bayesian Networks. In Transactions on Software Engineering and Methodology. ACM, 2015.

Digital Library

[23]

V. Raychev, M. Vechev, and E. Yahav. Code Completion with Statistical Language Models. In Proceedings of the 35th Conference on Programming Language Design and Implementation. ACM, 2014.

Digital Library

[24]

R. Robbes and M. Lanza. How Program History can Improve Code Completion. In Proceedings of the 23rd International Conference on Automated Software Engineering, 2008.

Digital Library

[25]

R. Robbes and M. Lanza. Improving Code Completion with Program History. Automated Software Engineering, 17(2), 2010.

Digital Library

[26]

M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini, and T. Ratchford. Automated API Property Inference Techniques. IEEE Transactions on Software Engineering, 2013.

Digital Library

[27]

M. P. Robillard, W. Maalej, R. J. Walker, and T. Zimmermann. Recommendation Systems in Software Engineering. Springer, 2014.

[28]

C. Zhang, J. Yang, Y. Zhang, J. Fan, X. Zhang, J. Zhao, and P. Ou. Automatic Parameter Recommendation for Practical API Usage. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012.

Digital Library

[29]

H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and Recommending API Usage Patterns. In Proceedings of the 23rd European Conference on Object-Oriented Programming. Springer, 2009.

Digital Library

Cited By

Izadi MKatzy JVan Dam TOtten MPopescu RVan Deursen ARoychoudhury APaiva AAbreu RStorey M(2024)Language Models for Code Completion: A Practical EvaluationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639138(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639138
Weyssow MZhou XKim KLo DSahraoui HChandra SBlincoe KTonella P(2023)On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of CodeProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616244(1470-1482)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616244
Ciniselli MPascarella LAghajani EScalabrino SOliveto RBavota G(2023)Source Code Recommender Systems: The Practitioners' Perspective2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00182
Show More Cited By

Index Terms

Evaluating the evaluations of code recommender systems: a reality check

Recommendations

Source Code Recommender Systems: The Practitioners' Perspective
ICSE '23: Proceedings of the 45th International Conference on Software Engineering

The automatic generation of source code is one of the long-lasting dreams in software engineering research. Several techniques have been proposed to speed up the writing of new code. For example, code completion techniques can recommend to developers ...
Demystifying code snippets in code reviews: a study of the OpenStack and Qt communities and a practitioner survey
Abstract
Code review is widely known as one of the best practices for software quality assurance in software development. In a typical code review process, reviewers check the code committed by developers to ensure the quality of the code, during which ...
Code smells detection via modern code review: a study of the OpenStack and Qt communities
Abstract
Code review plays an important role in software quality control. A typical review process involves a careful check of a piece of code in an attempt to detect and locate defects and other quality issues/violations. One type of issue that may impact ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

August 2016

899 pages

ISBN:9781450338455

DOI:10.1145/2970276

General Chair:
David Lo
Singapore Management University, Singapore
,
Program Chairs:
Sven Apel
University of Passau, Germany
,
Sarfraz Khurshid
University of Texas at Austin, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ASE'16

Sponsor:

SIGAI
SIGSOFT
IEEE-CS

ASE'16: ACM/IEEE International Conference on Automated Software Engineering

September 3 - 7, 2016

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
363
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Izadi MKatzy JVan Dam TOtten MPopescu RVan Deursen ARoychoudhury APaiva AAbreu RStorey M(2024)Language Models for Code Completion: A Practical EvaluationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639138(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639138
Weyssow MZhou XKim KLo DSahraoui HChandra SBlincoe KTonella P(2023)On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of CodeProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616244(1470-1482)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616244
Ciniselli MPascarella LAghajani EScalabrino SOliveto RBavota G(2023)Source Code Recommender Systems: The Practitioners' Perspective2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00182
Mastropaolo APascarella LGuglielmi ECiniselli MScalabrino SOliveto RBavota G(2023)On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00181(2149-2160)Online publication date: May-2023
https://doi.org/10.1109/ICSE48619.2023.00181
Bibaev VKalina ALomshakov VGolubev YBezzubov APovarov NBryksin TRoychoudhury ACadar CKim M(2022)All you need is logs: improving code completion by learning from anonymous IDE usage logsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558968(1269-1279)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3558968
Niranjan PMoeed SPranitha VSpurgeon TKavitha VAnusha G(2021)Improve Code Search via Reformulating Queries With Evolving Contexts2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICICCS51141.2021.9432363(1327-1333)Online publication date: 6-May-2021
https://doi.org/10.1109/ICICCS51141.2021.9432363
Zou QZhang C(2020)Query expansion via learning change sequencesInternational Journal of Knowledge-based and Intelligent Engineering Systems10.3233/KES-20003324:2(95-105)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.3233/KES-200033
Chugh KSolis ALaToza T(2019)Editable AI: Mixed Human-AI Authoring of Code Patterns2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VLHCC.2019.8818871(35-43)Online publication date: Oct-2019
https://doi.org/10.1109/VLHCC.2019.8818871
Xie RKong XWang LZhou YLi B(2019)HiRec: API Recommendation using Hierarchical Context2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE.2019.00044(369-379)Online publication date: Oct-2019
https://doi.org/10.1109/ISSRE.2019.00044
Hellendoorn VProksch SGall HBacchelli AAtlee JBultan TWhittle J(2019)When code completion failsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00101(960-970)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICSE.2019.00101
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten