Skip to main content
Log in

A Machine Learning Approach for Identification Thesis and Conclusion Statements in Student Essays

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

This study describes and evaluates twoessay-based discourse analysis systems thatidentify thesis and conclusion statements fromstudent essays written on six different essaytopics. Essays used to train and evaluate thesystems were annotated by two human judges,according to a discourse annotation protocol. Using a machine learning approach, a number ofdiscourse-related features were automaticallyextracted from a set of annotated trainingdata. Using these features, two discourseanalysis models were built using C5.0 withboosting: a topic-dependent and atopic-independent model. Both systemsoutperformed a positional algorithm. While thetopic-dependent system showed somewhat higherperformance, the topic-independent systemshowed similar results, indicating that asystem can generalize to unseen data – thatis, essay responses on topics that the systemhas not seen in training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Burstein J., Leacock C., Chodorow M. (forthcoming) Criterion On-line Essay Evaluation: An Application for Automated Evaluation of Student Essays. To appear in Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico, August, 2003.

  • Burstein J., Marcu D., Knight K. (2003) Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays. In Harabagiu S. and Ciravegna F. (eds.), Special Issue on Advances in Natural Language Processing, IEEE Intelligent Systems, Vol. 18, No. 1, pp. 32–39.

  • Burstein J. (2003) The E-rater® Scoring Engine: Automated Essay Scoring With Natural Language Processing. In Shermis M.D. and Burstein J. (eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 113–121.

    Google Scholar 

  • Burstein J., Marcu D. (2003) Automated Evaluation of Discourse Structure in Student Essays. In Shermis M.D. and Burstein J. (eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 209–229.

    Google Scholar 

  • Burstein J., Marcu D., Andreyev S., Chodorow M. (2001) Towards Automatic Classification of Discourse Elements in Essays. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, July, 2001, 15–21.

  • Burstein J., Kukich K., Wolff S., Lu C., Chodorow M. (1998a) Enriching Automated Scoring using Discourse Marking. In Proceedings of the Workshop on Discourse Relations & Discourse Marking, Annual Meeting of the Association of Computational Linguistics, August, 1998. Montreal, Canada, pp. 90–97.

  • Burstein J., Wolff Kukich K., Lu S., Chodorow C., Braden-Harder L.M., Harris M.D. (1998b) Automated Scoring Using A Hybrid Feature Identification Technique. Proceedings of ACL, pp. 206–210.

  • Elliott S. (2003) Intellimetric™: From Here to Validity. In Shermis M.D. and Burstein J. (eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 71–86.

    Google Scholar 

  • Krippendorff K. (1980) Content Analysis: An Introduction to Its Methodology. Sage Publishers, Thousand Oaks, CA.

    Google Scholar 

  • Landauer T., Laham D., Foltz P. (2003) Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor. In Shermis M.D. and Burstein J. (eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 87–112.

    Google Scholar 

  • Larkey L., Croft W.B. (2003) A Text Categorization Approach to Automated Essay Scoring. In Shermis M.D. and Burstein J. (eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 55–70.

    Google Scholar 

  • Leacock C., Chodorow M. (2003) Automated Grammatical Error Detection. In Shermis M.D. and Burstein J. (eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 195–207.

    Google Scholar 

  • Mann W.C., Thompson S.A. (1988) Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text, 8/3, pp. 243–281.

    Google Scholar 

  • Marcu D. (2000) The Theory and Practice of Discourse Parsing and Summarization. MIT Press.

  • Page E.B. (2003) Project Essay Grade: PEG. In Shermis M.D. and Burstein J. (eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective, Lawrence Erlbaum Associates, Inc., Hillsdale, NJ, pp. 43–54.

    Google Scholar 

  • Quirk R., Greenbaum S., Leech S., Svartik J. (1985) A Comprehensive Grammar of the English Language. Longman, New York.

    Google Scholar 

  • Scardamalia M., Bereiter C. (1985). Development of Dialectical Processes in Composition. In Olson D.R., Torrance N. and Hildyard A. (eds.), Literacy, Language, and Learning: The Nature of Consequences of Reading and Writing. Cambridge University Press.

  • White E.M. (1994) Teaching and Assessing Writing. Jossey-Bass Publishers, pp. 103–108.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Burstein, J., Marcu, D. A Machine Learning Approach for Identification Thesis and Conclusion Statements in Student Essays. Computers and the Humanities 37, 455–467 (2003). https://doi.org/10.1023/A:1025746505971

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025746505971

Navigation