research-article

Question Independent Grading using Machine Learning: The Case of Computer Program Grading

Authors:
Gursimran Singh

Aspiring Minds, Gurgaon, India

Aspiring Minds, Gurgaon, India
View Profile

,
Shashank Srikant

Aspiring Minds, Gurgaon, India

Aspiring Minds, Gurgaon, India
View Profile

,
Varun Aggarwal

Aspiring Minds, Gurgaon, India

Aspiring Minds, Gurgaon, India
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 263–272https://doi.org/10.1145/2939672.2939696

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 263–272

ABSTRACT

Learning supervised models to grade open-ended responses is an expensive process. A model has to be trained for every prompt/question separately, which in turn requires graded samples. In automatic programming evaluation specifically, the focus of this work, this issue is amplified. The models have to be trained not only for every question but also for every language the question is offered in. Moreover, the availability and time taken by experts to create a labeled set of programs for each question is a major bottleneck in scaling such a system. We address this issue by presenting a method to grade computer programs which requires no manually assigned labeled samples for grading responses to a new, unseen question. We extend our previous work [25] wherein we introduced a grammar of features to learn question specific models. In this work, we propose a method to transform those features into a set of features that maintain their structural relation with the labels across questions. Using these features we learn one supervised model, across questions for a given language, which can then be applied to an ungraded response to an unseen question. We show that our method rivals the performance of both, question specific models and the consensus among human experts while substantially outperforming extant ways of evaluating codes. We demonstrate the system single s value by deploying it to grade programs in a high stakes assessment. The learning from this work is transferable to other grading tasks such as math question grading and also provides a new variation to the supervised learning approach.

Supplemental Material

kdd2016_singh_program_grading_01-acm.mp4

mp4

402.6 MB

Download

References

Automata. Aspiring Minds http://www.aspiringminds.com/technology/automata.Google Scholar
E-rater. ETS http://www.ets.org/research/topics/as_nlp/writing_quality/.Google Scholar
Intelli metric. Vantage Learning http://www.vantagelearning.com/products/intellimetric/.Google Scholar
Speechrater. ETS https://www.ets.org/research/topics/as_nlp/speech/.Google Scholar
Svar. Aspiring Minds http://www.aspiringminds.com/technology/svar.Google Scholar
V. Aggarwal, S. Srikant, and V. Shashidhar. Principles for using machine learning in the assessment of open response items: Programming assessment as a case study. In NIPS Workshop on Data Driven Education, 2013.Google Scholar
J. Baxter. A bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28(1):7--39, 1997. Google ScholarDigital Library
J. Bernstein, A. Van Moere, and J. Cheng. Validating automated speaking tests. Language Testing, 2010.Google ScholarCross Ref
M. Birenbaum and K. K. Tatsuoka. Open-ended versus multiple-choice response formats-it does make a difference for diagnostic purposes. Applied Psychological Measurement, 11(4):385--395, 1987.Google ScholarCross Ref
H. M. Breland. The direct assessment of writing skill: A measurement review. ETS Research Report Series, 1983(2):i--23, 1983.Google Scholar
J. Burstein, L. Braden-Harder, M. Chodorow, S. Hua, B. Kaplan, K. Kukich, C. Lu, J. Nolan, D. Rock, and S. Wolff. Computer analysis of essay content for automated score prediction: A prototype automated scoring system for gmat analytical writing assessment essays. ETS Research Report Series, 1998(1):i--67, 1998.Google Scholar
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011. Google ScholarDigital Library
H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, pages 101--126, 2006. Google ScholarDigital Library
E. L. Glassman, J. Scott, R. Singh, P. J. Guo, and R. C. Miller. Overcode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI), 22(2):7, 2015. Google ScholarDigital Library
J. Huang, C. Piech, A. Nguyen, and L. Guibas. Syntactic and functional variability of a million code submissions in a machine learning mooc. In AIED 2013 Workshops Proceedings Volume, page 25. Citeseer, 2013.Google Scholar
A. S. Lan, D. Vats, A. E. Waters, and R. G. Baraniuk. Mathematical language processing: Automatic grading and feedback for open response mathematical questions. In Proceedings of the Second (2015) ACM Conference on Learning@ Scale, pages 167--176. ACM, 2015. Google ScholarDigital Library
N. Meinshausen and P. Bühlmann. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):417--473, 2010.Google Scholar
L. Pappano. The year of the mooc. The New York Times (Accessed: 2016--2--2).Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
K. Rivers and K. R. Koedinger. Automatic generation of programming feedback: A data-driven approach. In The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), page 50, 2013.Google Scholar
V. Shashidhar, N. Pandey, and V. Aggarwal. Automatic spontaneous speech grading: A novel feature derivation technique using the crowd. In Proceedings of the Conference of the Association for Computational Linguistics. ACL, 2015.Google ScholarCross Ref
V. Shashidhar, N. Pandey, and V. Aggarwal. Spoken english grading: Machine learning with crowd intelligence. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2089--2097. ACM, 2015. Google ScholarDigital Library
R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In ACM SIGPLAN Notices, volume 48, pages 15--26. ACM, 2013. Google ScholarDigital Library
V. Southavilay, K. Yacef, P. Reimann, and R. A. Calvo. Analysis of collaborative writing processes using revision maps and probabilistic topic models. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, pages 38--47. ACM, 2013. Google ScholarDigital Library
S. Srikant and V. Aggarwal. A system to grade computer programming skills using machine learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1887--1896. ACM, 2014. Google ScholarDigital Library
S. Thrun. Is learning the n-th thing any easier than learning the first? Advances in neural information processing systems, pages 640--646, 1996.Google Scholar
C. Vleuten, G. Norman, and E. Graaff. Pitfalls in the pursuit of objectivity: issues of reliability. Medical education, 25(2):110--118, 1991.Google ScholarCross Ref

Index Terms

Question Independent Grading using Machine Learning: The Case of Computer Program Grading

Recommendations

A system to grade computer programming skills using machine learning
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

The automatic evaluation of computer programs is a nascent area of research with a potential for large-scale impact. Extant program assessment systems score mostly based on the number of test-cases passed, providing no insight into the competency of the ...
Read More
An Exploration of Automated Grading of Complex Assignments
L@S '16: Proceedings of the Third (2016) ACM Conference on Learning @ Scale

Automated grading is essential for scaling up learning. In this paper, we conduct the first systematic study of how to automate grading of a complex assignment using a medical case assessment as a test case. We propose to solve this problem using a ...
Read More
A comparison of computer-assisted cooperative learning with independent learning
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MOOC
automatic grading
feature engineering
one-class learning
question independent learning
recruitment
supervised learning
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 460
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Question Independent Grading using Machine Learning: The Case of Computer Program Grading

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

A system to grade computer programming skills using machine learning

An Exploration of Automated Grading of Complex Assignments

A comparison of computer-assisted cooperative learning with independent learning