research-article

Measuring code behavioral similarity for programming and software engineering education

Authors:

Nikolai TillmannAuthors Info & Claims

ICSE '16: Proceedings of the 38th International Conference on Software Engineering Companion

Pages 501 - 510

https://doi.org/10.1145/2889160.2889204

Published: 14 May 2016 Publication History

Abstract

In recent years, online programming and software engineering education via information technology has gained a lot of popularity. Typically, popular courses often have hundreds or thousands of students but only a few course staff members. Tool automation is needed to maintain the quality of education. In this paper, we envision that the capability of quantifying behavioral similarity between programs is helpful for teaching and learning programming and software engineering, and propose three metrics that approximate the computation of behavioral similarity. Specifically, we leverage random testing and dynamic symbolic execution (DSE) to generate test inputs, and run programs on these test inputs to compute metric values of the behavioral similarity. We evaluate our metrics on three real-world data sets from the Pex4Fun platform (which so far has accumulated more than 1.7 million game-play interactions). The results show that our metrics provide highly accurate approximation to the behavioral similarity. We also demonstrate a number of practical applications of our metrics including hint generation, progress indication, and automatic grading.

References

[1]

Pex4Fun. http://www.pex4fun.com/.

[2]

R. Alur, L. D'Antoni, S. Gulwani, D. Kini, and M. Viswanathan. Automated grading of DFA constructions. In Proc. IJCAI, pages 1976--1982, 2013.

Digital Library

[3]

S. Bates and S. Horwitz. Incremental program testing using program dependence graphs. In Proc. POPL, pages 384--396, 1993.

Digital Library

[4]

I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. Clone detection using abstract syntax trees. In Proc. ICSM, pages 368--377, 1998.

Digital Library

[5]

D. Binkley. Using semantic differencing to reduce the cost of regression testing. In Proc. ICSM, pages 41--50. 1992.

[6]

J. Bishop, R. N. Horspool, T. Xie, N. Tillmann, and J. de Halleux. Code Hunt: Experience with coding contests at scale. Proc. IGSE, JSEET, pages 398--407, 2015.

Digital Library

[7]

T. Y. Chen, F.-C. Kuo, R. G. Merkel, and T. H. Tse. Adaptive random testing: The art of test case diversity. J. Syst. Softw., 83(1):60--66, 2010.

Digital Library

[8]

Y. Dang, D. Zhang, S. Ge, C. Chu, Y. Qiu, and T. Xie. XIAO: Tuning code clones at hands of engineers in practice. In Proc. ACSAC, pages 369--378, 2012.

Digital Library

[9]

L. M. de Moura and N. Bjørner. Z3: An efficient SMT solver. In Proc. TACAS, pages 337--340, 2008.

Digital Library

[10]

J. Geldenhuys, M. B. Dwyer, and W. Visser. Probabilistic symbolic execution. In Proc. ISSTA, pages 166--176, 2012.

Digital Library

[11]

P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In Proc. PLDI, pages 213--223, 2005.

Digital Library

[12]

P. Godefroid, M. Y. Levin, and D. A. Molnar. Automated whitebox fuzz testing. In Proc. NDSS, pages 151--166, 2008.

[13]

J. B. Hext and J. W. Winings. An automatic grading scheme for simple programming exercises. Commun. ACM, 12(5):272--275, 1969.

Digital Library

[14]

D. Jackson and D. A. Ladd. Semantic Diff: A tool for summarizing the effects of modifications. In Proc. ICSM, pages 243--252, 1994.

Digital Library

[15]

D. Jackson and M. Usher. Grading student programs using ASSYST. In Proc. SIGCSE, pages 335--339, 1997.

Digital Library

[16]

L. Jiang and Z. Su. Automatic mining of functionally equivalent code fragments via random testing. In Proc. ISSTA, pages 81--92, 2009.

Digital Library

[17]

T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng., 28(7):654--670, 2002.

Digital Library

[18]

R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In Proc. SAS, pages 40--56, 2001.

Digital Library

[19]

S. K. Lahiri, C. Hawblitzel, M. Kawaguchi, and H. Rebêlo. SymDiff: A language-agnostic semantic diff tool for imperative programs. In Proc. CAV, pages 712--717, 2012.

Digital Library

[20]

E. Merlo, G. Antoniol, M. Di Penta, and V. F. Rollo. Linear complexity object-oriented similarity for clone detection and software evolution analyses. In Proc. ICSM, pages 412--416, 2004.

Digital Library

[21]

C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Proc. ICSE, pages 75--84, 2007.

Digital Library

[22]

S. Person, M. B. Dwyer, S. Elbaum, and C. S. Päsäreanu. Differential symbolic execution. In Proc. FSE, pages 226--237, 2008.

Digital Library

[23]

K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In Proc. ESEC/FSE, pages 263--272, 2005.

Digital Library

[24]

R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In Proc. PLDI, pages 15--26, 2013.

Digital Library

[25]

K. Taneja and T. Xie. DiffGen: Automated regression unit-test generation. In Proc. ASE, pages 407--410, 2008.

Digital Library

[26]

N. Tillmann and J. de Halleux. Pex-White box test generation for. NET. In Proc. TAP, pages 134--153, 2008.

Digital Library

[27]

N. Tillmann, J. de Halleux, and T. Xie. Transferring an automated test generation tool to practice: From Pex to Fakes and Code Digger. In Proc. ASE, pages 385--396, 2014.

Digital Library

[28]

N. Tillmann, J. De Halleux, T. Xie, S. Gulwani, and J. Bishop. Teaching and learning programming and software engineering via interactive gaming. In Proc. ICSE, SEE, pages 1117--1126, 2013.

Digital Library

[29]

T. Wang, X. Su, Y. Wang, and P. Ma. Semantic similarity-based grading of student programs. Inf. Softw. Technol., 49(2):99--107, 2007.

Digital Library

[30]

X. Xiao, S. Li, T. Xie, and N. Tillmann. Characteristic studies of loop problems for structural test generation via symbolic execution. In Proc. ASE, pages 246--256, 2013.

Digital Library

Cited By

Cotroneo DFoggia AImprota CLiguori PNatella R(2024)Automating the correctness assessment of AI-generated code for security contextsJournal of Systems and Software10.1016/j.jss.2024.112113216:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112113
Kim KGhatpande SKim DZhou XLiu KBissyandé TKlein JLe Traon Y(2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3604905
Shenoy APrabhu SMadhukar KShemer RSrivas M(2023)Automated Property Directed Self CompositionAutomated Technology for Verification and Analysis10.1007/978-3-031-45332-8_7(139-158)Online publication date: 19-Oct-2023
https://doi.org/10.1007/978-3-031-45332-8_7
Show More Cited By

Measuring code behavioral similarity for programming and software engineering education
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Issues in Software Engineering Education: Proceedings of the 1987 SEI Conference on Software Engineering Education, Held in Monroeville, Paris, April 30- May 1, 1987
Advancing Software Engineering Professional Education

The importance and complexity of software systems require software engineers who possess the appropriate skills, knowledge, and experience to develop, maintain, and acquire such systems. Graduate education is key in advancing professional software ...
Software Engineering Education: from an Engineering Perspective
SEEP '96: Proceedings of the 1996 International Conference on Software Engineering: Education and Practice (SE:EP '96)

In this paper we explore the development of the curriculum for a new professional engineering degree in Software Engineering. Software Engineering is still a relatively new discipline (in its own right) and its place in the tertiary education sector is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '16: Proceedings of the 38th International Conference on Software Engineering Companion

May 2016

946 pages

ISBN:9781450342056

DOI:10.1145/2889160

General Chair:
Laura Dillon
Michigan State University
,
Program Chairs:
Willem Visser
Stellenbosch University, South Africa
,
Laurie Williams
North Carolina State University

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ACM: Association for Computing Machinery
SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS\TCSE: TC on Software Engineering
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICSE '16

Sponsor:

ACM
SIGSOFT
IEEE-CS\TCSE
IEEE-CS\DATC

ICSE '16: 38th International Conference on Software Engineering

May 14 - 22, 2016

Texas, Austin

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
306
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cotroneo DFoggia AImprota CLiguori PNatella R(2024)Automating the correctness assessment of AI-generated code for security contextsJournal of Systems and Software10.1016/j.jss.2024.112113216:COnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112113
Kim KGhatpande SKim DZhou XLiu KBissyandé TKlein JLe Traon Y(2023)Big Code Search: A BibliographyACM Computing Surveys10.1145/360490556:1(1-49)Online publication date: 26-Aug-2023
https://dl.acm.org/doi/10.1145/3604905
Shenoy APrabhu SMadhukar KShemer RSrivas M(2023)Automated Property Directed Self CompositionAutomated Technology for Verification and Analysis10.1007/978-3-031-45332-8_7(139-158)Online publication date: 19-Oct-2023
https://doi.org/10.1007/978-3-031-45332-8_7
Keller PKaboré APlein LKlein JLe Traon YBissyandé T(2021)What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer LearningACM Transactions on Software Engineering and Methodology10.1145/348513531:2(1-34)Online publication date: 24-Dec-2021
https://dl.acm.org/doi/10.1145/3485135
Hahn MNavarro SDe La Fuente Valentin LBurgos D(2021)A Systematic Review of the Effects of Automatic Scoring and Automatic Feedback in Educational SettingsIEEE Access10.1109/ACCESS.2021.31008909(108190-108198)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3100890
Cheers HLin YSmith S(2021)Academic Source Code Plagiarism Detection by Measuring Program Behavioral SimilarityIEEE Access10.1109/ACCESS.2021.30693679(50391-50412)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3069367
Wang TSantoso DWang KSu XLv Z(2020)Automatic Grading for Complex Multifile ProgramsComplexity10.1155/2020/32790532020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/3279053
Mekterovic IBrkic LMilasinovic BBaranovic M(2020)Building a Comprehensive Automated Programming Assessment SystemIEEE Access10.1109/ACCESS.2020.29909808(81154-81172)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2990980
Kamp MKreutzer PPhilippsen MStorey MAdams BHaiduc S(2019)SeSaMeProceedings of the 16th International Conference on Mining Software Repositories10.1109/MSR.2019.00079(529-533)Online publication date: 26-May-2019
https://dl.acm.org/doi/10.1109/MSR.2019.00079
Sun WWang XWu HDuan DSun ZChen ZBeecham SDamian D(2019)MAFProceedings of the 41st International Conference on Software Engineering: Software Engineering Education and Training10.1109/ICSE-SEET.2019.00020(110-120)Online publication date: 27-May-2019
https://dl.acm.org/doi/10.1109/ICSE-SEET.2019.00020
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten