skip to main content
10.1145/2984511.2984544acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

CodeMend: Assisting Interactive Programming with Bimodal Embedding

Published: 16 October 2016 Publication History

Abstract

Software APIs often contain too many methods and parameters for developers to memorize or navigate effectively. Instead, developers resort to finding answers through online search engines and systems such as Stack Overflow. However, the process of finding and integrating a working solution is often very time-consuming. Though code search engines have increased in quality, there remain significant language- and workflow-gaps in meeting end-user needs. Novice and intermediate programmers often lack the language to query, and the expertise in transferring found code to their task. To address this problem, we present CodeMend, a system to support finding and integration of code. CodeMend leverages a neural embedding model to jointly model natural language and code as mined from large Web and code datasets. We also demonstrate a novel, mixed-initiative, interface to support query and integration steps. Through CodeMend, end-users describe their goal in natural language. The system makes salient the relevant API functions, the lines in the end-user's program that should be changed, as well as proposing the actual change. We demonstrate the utility and accuracy of CodeMend through lab and simulation studies.

Supplementary Material

suppl.mov (uist2681-file3.mp4)
Supplemental video
MP4 File (p247-rong.mp4)

References

[1]
Adar, E., Dontcheva, M., and Laput, G. Commandspace: Modeling the relationships between tasks, descriptions and features. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology, UIST '14, ACM (2014), 167--176.
[2]
Allamanis, M., Tarlow, D., Gordon, A., and Wei, Y. Bimodal modelling of source code and natural language. In Proceedings of The 32nd International Conference on Machine Learning (2015), 2123--2132.
[3]
Bajracharya, S. K., Ossher, J., and Lopes, C. V. Leveraging usage similarity for effective retrieval of examples in code repositories. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE '10, ACM (2010), 157--166.
[4]
Bielik, P., Raychev, V., and Vechev, M. Programming with "Big Code": Lessons, Techniques and Applications. 1st Summit on Advances in Programming Languages (2015), 41.
[5]
Brandt, J., Dontcheva, M., Weskamp, M., and Klemmer, S. R. Example-centric programming: Integrating web search into the development environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2010), 513--522.
[6]
Chatterjee, S., Juvekar, S., and Sen, K. Sniff: A search engine for java using free-form queries. In Fundamental Approaches to Software Engineering. Springer, 2009, 385--400.
[7]
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391--407.
[8]
Desai, A., Gulwani, S., Hingorani, V., Jain, N., Karkare, A., Marron, M., R, S., and Roy, S. Program synthesis using natural language. In Proceedings of the 38th International Conference on Software Engineering, ICSE '16, ACM (2016), 345--356.
[9]
Devert, A. matplotlib Plotting Cookbook. Packt Publishing Ltd, 2014.
[10]
Fisher, D., Chandramouli, B., DeLine, R., Goldstein, J., Aron, A., Barnett, M., Platt, J. C., Terwilliger, J. F., and Wernsing, J. Tempe: an interactive data science environment for exploration of temporal and streaming data. Tech. rep., MSR-TR-2014-148, 2014.
[11]
Galenson, J., Reames, P., Bodik, R., Hartmann, B., and Sen, K. Codehint: Dynamic and interactive synthesis of code snippets. In Proceedings of the 36th International Conference on Software Engineering, ACM (2014), 653--663.
[12]
Granger, B., Silvester, S., Grout, J., Perez, F., Corlay, S., Colbert, Chris, O. C., Willmer, D., and Darian, A. Jupyterlab: Building blocks for interactive computing. SciPy 2016, 2016.
[13]
Gvero, T., and Kuncak, V. Interactive synthesis using free-form queries. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2 (May 2015), 689--692.
[14]
Gvero, T., and Kuncak, V. Synthesizing Java expressions from free-form queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM (2015), 416--432.
[15]
Hartmann, B., MacDougall, D., Brandt, J., and Klemmer, S. R. What would other programmers do: suggesting solutions to error messages. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2010), 1019--1028.
[16]
Hey, T., Hey, A. J., and Pápay, G. The computing universe: a journey through a revolution. Cambridge University Press, 2014.
[17]
Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering, ICSE '12, IEEE Press (2012), 837--847.
[18]
Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In Proceedings of the 27th international conference on Software engineering, ACM (2005), 117--125.
[19]
Hsiao, C.-H., Cafarella, M., and Narayanasamy, S. Using web corpus statistics for program analysis. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA '14, ACM (New York, NY, USA, 2014), 49--65.
[20]
Hunter, J. D. Matplotlib: A 2D graphics environment. Computing In Science & Engineering 9, 3 (2007), 90--95.
[21]
Karpathy, A. The unreasonable effectiveness of recurrent neural networks, 2015. Available at: http://karpathy. github.io/2015/05/21/rnn-effectiveness/.
[22]
Keivanloo, I., Rilling, J., and Zou, Y. Spotting working code examples. In Proceedings of the 36th International Conference on Software Engineering, ACM (2014), 664--675.
[23]
Krugle. Krugle Code Search. http://www.krugle.com/.
[24]
Le, Q. V., and Mikolov, T. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053 (2014).
[25]
Linstead, E., Bajracharya, S., Ngo, T., Rigor, P., Lopes, C., and Baldi, P. Sourcerer: Mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300--336.
[26]
Little, G., and Miller, R. C. Keyword programming in Java. Automated Software Engineering 16, 1 (2009), 37--71.
[27]
Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the API jungle. ACM SIGPLAN Notices 40, 6 (2005), 48--61.
[28]
Mayer, M., Soares, G., Grechkin, M., Le, V., Marron, M., Polozov, A., Singh, R., Zorn, B., and Gulwani, S. User interaction models for disambiguation in programming by example. In 28th ACM User Interface Software and Technology Symposium (2015).
[29]
McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., and Xie, Q. Exemplar: A source code search engine for finding highly relevant applications. Software Engineering, IEEE Transactions on 38, 5 (2012), 1069--1087.
[30]
Mcmillan, C., Poshyvanyk, D., Grechanik, M., Xie, Q., and Fu, C. Portfolio: Searching for relevant functions and their usages in millions of lines of code. ACM Transactions on Software Engineering and Methodology (TOSEM) 22, 4 (2013), 37.
[31]
Mikolov, T., Chen, K., Corrado, G., and Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[32]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26, Curran Associates, Inc. (2013), 3111--3119.
[33]
Miller, G. A. WordNet: a lexical database for English. Communications of the ACM 38, 11 (1995), 39--41.
[34]
Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA '12, ACM (New York, NY, USA, 2012), 997--1016.
[35]
Mou, L., Men, R., Li, G., Zhang, L., and Jin, Z. On End-to-End Program Generation from User Intention by Deep Neural Networks. arXiv preprint arXiv:1510.07211 (2015).
[36]
Nguyen, T. T., Nguyen, A. T., Nguyen, H. A., and Nguyen, T. N. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ACM (2013), 532--542.
[37]
Oney, S., and Brandt, J. Codelets: Linking interactive documentation and example code in the editor. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (2012), 2697--2706.
[38]
Peng, H., Mou, L., Li, G., Liu, Y., Zhang, L., and Jin, Z. Building program vector representations for deep learning. In Knowledge Science, Engineering and Management. Springer, 2015, 547--553.
[39]
Piech, C., Huang, J., Nguyen, A., Phulsuksombati, M., Sahami, M., and Guibas, L. Learning program embeddings to propagate feedback on student code. In Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015), 1093--1102.
[40]
Raghothaman, M., Wei, Y., and Hamadi, Y. Swim: Synthesizing what i mean. arXiv preprint arXiv:1511.08497 (2015).
[41]
Raychev, V., Vechev, M., and Krause, A. Predicting Program Properties from "Big Code". In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '15, ACM (New York, NY, USA, 2015), 111--124.
[42]
Raychev, V., Vechev, M., and Yahav, E. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, ACM (New York, NY, USA, 2014), 419--428.
[43]
Raza, M., Gulwani, S., and Milic-Frayling, N. Compositional program synthesis from natural language and examples. In Proceedings of the 24th International Conference on Artificial Intelligence, AAAI Press (2015), 792--800.
[44]
Reiss, S. P. Semantics-based code search. In Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on Software Engineering, IEEE (2009), 243--253.
[45]
Sahavechaphan, N., and Claypool, K. XSnippet: Mining for sample code. ACM Sigplan Notices 41, 10 (2006), 413--430.
[46]
Thummalapenta, S., and Xie, T. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the 22nd IEEE/ACM international conference on Automated software engineering, ACM (2007), 204--213.
[47]
Wightman, D., Ye, Z., Brandt, J., and Vertegaal, R. Snipmatch: Using source code context to enhance snippet retrieval and parameterization. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST '12, ACM (2012), 219--228.
[48]
Ye, Y., and Fischer, G. Reuse-conducive development environments. Automated Software Engineering 12, 2 (2005), 199--235.
[49]
Yessenov, K., Tulsiani, S., Menon, A., Miller, R. C., Gulwani, S., Lampson, B., and Kalai, A. A colorful approach to text processing by example. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, UIST '13, ACM (2013), 495--504.

Cited By

View all
  • (2024)Training AI Model that Suggests Python Code from Student Requests in Natural LanguageJournal of Information Processing10.2197/ipsjjip.32.6932(69-76)Online publication date: 2024
  • (2024)Grounding with Structure: Exploring Design Variations of Grounded Human-AI Collaboration in a Natural Language InterfaceProceedings of the ACM on Human-Computer Interaction10.1145/36869028:CSCW2(1-27)Online publication date: 8-Nov-2024
  • (2024)Modeling source code in bimodal for program comprehensionNeural Computing and Applications10.1007/s00521-024-09498-036:22(13815-13832)Online publication date: 1-Aug-2024
  • Show More Cited By

Index Terms

  1. CodeMend: Assisting Interactive Programming with Bimodal Embedding

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '16: Proceedings of the 29th Annual Symposium on User Interface Software and Technology
    October 2016
    908 pages
    ISBN:9781450341899
    DOI:10.1145/2984511
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. natural language code search
    2. program embedding
    3. word embedding

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    UIST '16

    Acceptance Rates

    UIST '16 Paper Acceptance Rate 79 of 384 submissions, 21%;
    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)21
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Training AI Model that Suggests Python Code from Student Requests in Natural LanguageJournal of Information Processing10.2197/ipsjjip.32.6932(69-76)Online publication date: 2024
    • (2024)Grounding with Structure: Exploring Design Variations of Grounded Human-AI Collaboration in a Natural Language InterfaceProceedings of the ACM on Human-Computer Interaction10.1145/36869028:CSCW2(1-27)Online publication date: 8-Nov-2024
    • (2024)Modeling source code in bimodal for program comprehensionNeural Computing and Applications10.1007/s00521-024-09498-036:22(13815-13832)Online publication date: 1-Aug-2024
    • (2023)Generative Agents: Interactive Simulacra of Human BehaviorProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606763(1-22)Online publication date: 29-Oct-2023
    • (2023)Follow the Successful Herd: Towards Explanations for Improved Use and Mental Models of Natural Language SystemsProceedings of the 28th International Conference on Intelligent User Interfaces10.1145/3581641.3584088(220-239)Online publication date: 27-Mar-2023
    • (2023)On the Design of AI-powered Code Assistants for NotebooksProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580940(1-16)Online publication date: 19-Apr-2023
    • (2023)Projectional Editors for JSON-Based DSLs2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL-HCC57772.2023.00015(60-70)Online publication date: 3-Oct-2023
    • (2023)Too Simple? Notions of Task Complexity used in Maintenance-based Studies of Programming Tools2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)10.1109/ICPC58990.2023.00040(254-265)Online publication date: May-2023
    • (2022)Contextualized Programming Language DocumentationProceedings of the 2022 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/3563835.3567654(1-15)Online publication date: 29-Nov-2022
    • (2022)NL2Viz: natural language to visualization via constrained syntax-guided synthesisProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549140(972-983)Online publication date: 7-Nov-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media