Stratified Learning for Reducing Training Set Size

Hastings, Peter; Hughes, Simon; Blaum, Dylan; Wallace, Patricia; Britt, M. Anne

doi:10.1007/978-3-319-39583-8_39

Peter Hastings¹⁶,
Simon Hughes¹⁶,
Dylan Blaum¹⁷,
Patricia Wallace¹⁷ &
…
M. Anne Britt¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9684))

Included in the following conference series:

International Conference on Intelligent Tutoring Systems

4040 Accesses
3 Citations

Abstract

Educational standards put a renewed focus on strengthening students’ abilities to construct scientific explanations and engage in scientific arguments. Evaluating student explanatory writing is extremely time-intensive, so we are developing techniques to automatically analyze the causal structure in student essays so that effective feedback may be provided. These techniques rely on a significant training corpus of annotated essays. Because one of our long-term goals is to make it easier to establish this approach in new subject domains, we are keenly interested in the question of how much training data is enough to support this. This paper describes our analysis of that question, and looks at one mechanism for reducing that data requirement which uses student scores on a related multiple choice test.

P. Hastings—The assessment project described in this article is funded, in part, by the Institute for Education Sciences, U.S. Department of Education (Grant R305F100007). The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The choice of group size is significant. As mentioned above, the distribution of multiple choice scores was fairly normal, and the least frequent score, 0, was assigned to 31 students. In order to maintain balanced representation of groups in the training set, some aggregation is necessary otherwise we could only test on a maximum of 31 items from each group. If the aggregation was too broad, however, it would decrease any benefit of balance in the training set.

References

Achieve, Inc: Next Generation Science Standards: The common core standards for english language arts and literacy in history/social studies and science and technical subjects. Council of Chief State School Officers (2013)
Google Scholar
Britt, M.A., Wallace, P., Blaum, D., Ko, M., Goldman, S.R.: Project READI science design team: multiple representations in science learning and assessment. In: Multiple Representations and Multimedia: Student Learning and Instruction. Symposium Conducted at the Annual Meeting of the AERA, Chicago, April 2015
Google Scholar
Britt, M.A., Richter, T., Rouet, J.F.: Scientific literacy: the role of goal-directed reading and evaluation in understanding scientific information. Educ. Psychol. 49(2), 104–122 (2014). doi:10.1080/00461520.2014.916217
Article Google Scholar
Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994). doi:10.1007/BF00993277
Google Scholar
Dietterich, T.G.: Machine learning for sequential data: a review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, p. 15. Springer, Heidelberg (2002)
Chapter Google Scholar
Duschl, R., Osborne, J.: Supporting and promoting argumentation discourse in science education. Stud. Sci. Educ. 38, 39–72 (2002)
Article Google Scholar
Hughes, S., Hastings, P., Britt, M.A., Wallace, P., Blaum, D.: Machine learning for holistic evaluation of scientific essays. In: Conati, C., Heffernan, N., Mitrovic, A., Verdejo, M.F. (eds.) AIED 2015. LNCS, vol. 9112, pp. 165–175. Springer, Heidelberg (2015)
Chapter Google Scholar
Hughes, S., Hastings, P., Magliano, J., Goldman, S., Lawless, K.: Automated approaches for detecting integration in student essays. In: Cerri, S.A., Clancey, W.J., Papadourakis, G., Panourgia, K. (eds.) ITS 2012. LNCS, vol. 7315, pp. 274–279. Springer, Heidelberg (2012)
Chapter Google Scholar
Kelly, G.J., Druker, S., Chen, C.: Students’ reasoning about electricity: combining performance assessments with argumentation analysis. Int. J. Sci. Educ. 20(7), 849–871 (1998)
Article Google Scholar
Meyer, B.J., Freedle, R.O.: Effects of discourse type on recall. Am. Educ. Res. J. 22(1), 121–143 (1984)
Article Google Scholar
Millis, K.K., Morgan, D., Graesser, A.C.: The influence of knowledge-based inferences on the reading time of expository text. Psychol. Learn. Motiv. 25, 197–212 (1990)
Article Google Scholar
Osborne, J., Erduran, S., Simon, S.: Enhancing the quality of argumentation in science classrooms. J. Res. Sci. Teach. 41(10), 994–1020 (2004)
Article Google Scholar
Osborne, J., Patterson, A.: Scientific argument and explanation: a necessary distinction? Sci. Educ. 95, 627–638 (2011)
Article Google Scholar
Shahrokh Esfahani, M., Dougherty, E.R.: Effect of separate sampling on classification accuracy. Bioinformatics 30(2), 242–250 (2014). http://bioinformatics.oxfordjournals.org/content/30/2/242.abstract
Article Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

DePaul University, Chicago, IL, USA
Peter Hastings & Simon Hughes
Northern Illinois University, Dekalb, IL, USA
Dylan Blaum, Patricia Wallace & M. Anne Britt

Authors

Peter Hastings
View author publications
You can also search for this author in PubMed Google Scholar
Simon Hughes
View author publications
You can also search for this author in PubMed Google Scholar
Dylan Blaum
View author publications
You can also search for this author in PubMed Google Scholar
Patricia Wallace
View author publications
You can also search for this author in PubMed Google Scholar
M. Anne Britt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Hastings .

Editor information

Editors and Affiliations

Roma Tre University , Rome, Italy
Alessandro Micarelli
Carnegie Mellon University , Pittsburgh, Pennsylvania, USA
John Stamper
Neoanalysis Ltd , Athens, Greece
Kitty Panourgia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hastings, P., Hughes, S., Blaum, D., Wallace, P., Britt, M.A. (2016). Stratified Learning for Reducing Training Set Size. In: Micarelli, A., Stamper, J., Panourgia, K. (eds) Intelligent Tutoring Systems. ITS 2016. Lecture Notes in Computer Science(), vol 9684. Springer, Cham. https://doi.org/10.1007/978-3-319-39583-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-39583-8_39
Published: 02 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39582-1
Online ISBN: 978-3-319-39583-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics