skip to main content
10.1145/2970276.2970330acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Evaluating the evaluations of code recommender systems: a reality check

Published: 25 August 2016 Publication History

Abstract

While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet completion, or code search, an accurate evaluation of such systems is always a challenge. We analyzed the current literature and found that most of the current evaluations rely on artificial queries extracted from released code, which begs the question: Do such evaluations reflect real-life usages? To answer this question, we capture 6,189 fine-grained development histories from real IDE interactions. We use them as a ground truth and extract 7,157 real queries for a specific method-call recommender system. We compare the results of such real queries with different artificial evaluation strategies and check several assumptions that are repeatedly used in research, but never empirically evaluated. We find that an evolving context that is often observed in practice has a major effect on the prediction quality of recommender systems, but is not commonly reflected in artificial evaluations.

References

[1]
S. Amann, S. Proksch, and S. Nadi. FeedBaG: An Interaction Tracker for Visual Studio. In Proceedings of the 24th International Conference on Program Comprehension Tool Track, 2016.
[2]
M. Bruch, M. Monperrus, and M. Mezini. Learning from Examples to Improve Code Completion Systems. In Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, 2009.
[3]
R. DeLine, M. Czerwinski, and G. Robertson. Easing Program Comprehension by Sharing Navigation Data. In 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’05), 2005.
[4]
Evaluating the Evaluations of Code Recommender Systems: A Reality Check – Online Artifact. http://www.st.informatik.tudarmstadt.de/artifacts/evaleval/.
[5]
T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete Completion Using Types and Weights. In Proceedings of the 34th Conference on Programming Language Design and Implementation. ACM, 2013.
[6]
A. E. Hassan and R. C. Holt. Replaying Development History to Assess the Effectiveness of Change Propagation Tools. Empirical Software Engineering, 11(3), 2006.
[7]
L. Heinemann, V. Bauer, M. Herrmannsdoerfer, and B. Hummel. Identifier-based Context-dependent API Method Recommendation. In Proceedings of the 16th European Conference on Software Maintenance and Reengineering. IEEE, 2012.
[8]
A. Hindle, E. Barr, Z. Su, M. Gabel, and P. Devanbu. On the Naturalness of Software. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012.
[9]
R. Holmes and G. C. Murphy. Using Structural Context to Recommend Source Code Examples. In Proceedings of the 27th International Conference on Software Engineering. ACM, 2005.
[10]
M. Kersten and G. C. Murphy. Mylar: A Degree-of-interest Model for IDEs. In Proceedings of the 4th International Conference on Aspect-oriented Software Development, AOSD ’05. ACM, 2005.
[11]
M. Kersten and G. C. Murphy. Using Task Context to Improve Programmer Productivity. In Proceedings of the 14th International Symposium on Foundations of Software Engineering, SIGSOFT’06/ FSE-14. ACM, 2006.
[12]
C. Kolassa, D. Riehle, and M. A. Salim. The Empirical Commit Frequency Distribution of Open Source Projects. In Proceedings of the 9th International Symposium on Open Collaboration. ACM, 2013.
[13]
D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman. Jungloid Mining: Helping to Navigate the API Jungle. In Proceedings of the 26th Conference on Programming Language Design and Implementation. ACM, 2005.
[14]
R. Minelli, A. Mocci, and M. Lanza. The Plague Doctor: A Promising Cure for the Window Plague. In Proceedings of the 23rd International Conference on Program Comprehension. IEEE, 2015.
[15]
L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How Can I Use This Method? In Proceedings of the 37th International Conference on Software Engineering. IEEE, 2015.
[16]
S. Negara, M. Codoban, D. Dig, and R. E. Johnson. Mining Fine-grained Code Changes to Detect Unknown Change Patterns. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014. ACM, 2014.
[17]
S. Negara, M. Vakilian, N. Chen, R. E. Johnson, and D. Dig. Proceedings of the 26th European Conference on Object-Oriented Programming, chapter Is It Dangerous to Use Version Control Histories to Study Source Code Evolution? Springer Berlin Heidelberg, 2012.
[18]
A. T. Nguyen, H. A. Nguyen, T. T. Nguyen, and T. N. Nguyen. Grapacc: A Graph-based Pattern-oriented, Context-sensitive Code Completion Tool. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012.
[19]
A. T. Nguyen and T. N. Nguyen. Graph-based Statistical Language Model for Code. In Proceedings of the 37th International Conference on Software Engineering. IEEE, 2015.
[20]
L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, and M. Lanza. Mining StackOverflow to Turn the IDE into a Self-confident Programming Prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories, 2014.
[21]
S. Proksch, S. Amann, S. Nadi, and M. Mezini. A Dataset of Simplified Syntax Trees for C#. In Proceedings of the 13th International Conference on Mining Software Repositories, Data Showcase, 2016.
[22]
S. Proksch, J. Lerch, and M. Mezini. Intelligent Code Completion with Bayesian Networks. In Transactions on Software Engineering and Methodology. ACM, 2015.
[23]
V. Raychev, M. Vechev, and E. Yahav. Code Completion with Statistical Language Models. In Proceedings of the 35th Conference on Programming Language Design and Implementation. ACM, 2014.
[24]
R. Robbes and M. Lanza. How Program History can Improve Code Completion. In Proceedings of the 23rd International Conference on Automated Software Engineering, 2008.
[25]
R. Robbes and M. Lanza. Improving Code Completion with Program History. Automated Software Engineering, 17(2), 2010.
[26]
M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini, and T. Ratchford. Automated API Property Inference Techniques. IEEE Transactions on Software Engineering, 2013.
[27]
M. P. Robillard, W. Maalej, R. J. Walker, and T. Zimmermann. Recommendation Systems in Software Engineering. Springer, 2014.
[28]
C. Zhang, J. Yang, Y. Zhang, J. Fan, X. Zhang, J. Zhao, and P. Ou. Automatic Parameter Recommendation for Practical API Usage. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012.
[29]
H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and Recommending API Usage Patterns. In Proceedings of the 23rd European Conference on Object-Oriented Programming. Springer, 2009.

Cited By

View all
  • (2024)Language Models for Code Completion: A Practical EvaluationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639138(1-13)Online publication date: 20-May-2024
  • (2023)On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of CodeProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616244(1470-1482)Online publication date: 30-Nov-2023
  • (2023)Source Code Recommender Systems: The Practitioners' Perspective2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
August 2016
899 pages
ISBN:9781450338455
DOI:10.1145/2970276
  • General Chair:
  • David Lo,
  • Program Chairs:
  • Sven Apel,
  • Sarfraz Khurshid
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Artificial Evaluation
  2. Empirical Study
  3. IDE Interaction Data

Qualifiers

  • Research-article

Funding Sources

Conference

ASE'16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)1
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Language Models for Code Completion: A Practical EvaluationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639138(1-13)Online publication date: 20-May-2024
  • (2023)On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of CodeProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616244(1470-1482)Online publication date: 30-Nov-2023
  • (2023)Source Code Recommender Systems: The Practitioners' Perspective2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00182(2161-2172)Online publication date: May-2023
  • (2023)On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00181(2149-2160)Online publication date: May-2023
  • (2022)All you need is logs: improving code completion by learning from anonymous IDE usage logsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558968(1269-1279)Online publication date: 7-Nov-2022
  • (2021)Improve Code Search via Reformulating Queries With Evolving Contexts2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS)10.1109/ICICCS51141.2021.9432363(1327-1333)Online publication date: 6-May-2021
  • (2020)Query expansion via learning change sequencesInternational Journal of Knowledge-based and Intelligent Engineering Systems10.3233/KES-20003324:2(95-105)Online publication date: 1-Jan-2020
  • (2019)Editable AI: Mixed Human-AI Authoring of Code Patterns2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VLHCC.2019.8818871(35-43)Online publication date: Oct-2019
  • (2019)HiRec: API Recommendation using Hierarchical Context2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE.2019.00044(369-379)Online publication date: Oct-2019
  • (2019)When code completion failsProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00101(960-970)Online publication date: 25-May-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media