Abstract
We report experiments on automatic essay grading using Latent Dirichlet Allocation (LDA). LDA is a “bag-of-words” type of language modeling and dimension reduction method, reported to outperform other related methods, Latent Semantic Analysis (LSA) and Probabilistic Latent Semantic Analysis (PLSA) in Information Retrieval (IR) domain. We introduce LDA in detail and compare its strengths and weaknesses to LSA and PLSA. We also compare empirically the performance of LDA to LSA and PLSA. The experiments were run with three essay sets consisting in total of 283 essays from different domains. On contrary to the findings in IR, LDA achieved slightly worse results compared to LSA and PLSA in the experiments. We state the reasons for LSA and PLSA outperforming LDA and indicate further research directions.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Page, E.B.: The Imminence of Grading Essays by Computer. Phi Delta Kappan 47, 238–243 (1966)
Burstein, J.: The E-Rater Scoring Engine: Automated Essay Scoring with Natural Language Processing. In: Shermis, M.D., Burstein, J. (eds.) Automated Essay Scoring: a Cross-Disciplinary Perspective, pp. 113–122. Lawrence Erlbaum Associates, Hillsdale (2003)
Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)
Hofmann, T.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning Journal 42, 177–196 (2000)
Landauer, T.K., Laham, D., Foltz, P.: Automatic Essay Assessment. Assessment in Education 10, 295–308 (2003)
Kakkonen, T., Myller, N., Sutinen, E., Timonen, J.: Automatic Essay Grading with Probabilistic Latent Semantic Analysis. In: Proceedings of the ACL 2005 Second Workshop on Building Educational Applications Using Natural Language Processing, Ann Arbor, Michigan, USA, pp. 29–36 (2005)
Lemaire, B., Dessus, P.: A System to Assess the Semantic Content of Student Essays. Journal of Educational Computing Research 24, 305–320 (2001)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Kakkonen, T., Sutinen, E.: Automatic Assessment of the Content of Essays Based on Course Materials. In: Proceedings of International Conference on Information Technology: Research and Education, London, UK, pp. 126–130 (2004)
Lingsoft: Lingsoft Ltd. (2005) (accessed 1.3.2006), WWW-page: http://www.lingsoft.fi
Kakkonen, T., Sutinen, E., Timonen, J.: Applying Validation Methods for Noise Reduction in LSA-based Essay Grading. WSEAS Transactions on Information Science and Applications 2, 1334–1342 (2005)
Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, pp. 352–359 (2002)
Girolami, M., Kabán, A.: On an Equivalence between PLSI and LDA. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 433–434. ACM Press, New York (2003)
Globerson, A., Tishby, N.: Sufficient Dimensionality Reduction. Journal of Machine Learning Research 3, 1307–1331 (2003)
Brants, T.: Test Data Likelihood for PLSA Models. Information Retrieval 8, 181–196 (2005)
Larkey, L.: Automatic Essay Grading Using Text Categorization Techniques. In: Proceedings of 21st Annual International Conference on Research and Development in Information Retrieval, pp. 90–95 (1998)
Landauer, T., Rehder, B., Schreiner, M.E.: How Well Can Passage Meaning Be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans. In: Proceedings of the 19th Annual Meeting of the Cognitive Science Society (1997)
Foltz, P.W., Gilliam, S., Kendall, S.: Supporting Content-based Feedback in Online Writing Evaluation with LSA. Interactive Learning Environments 8, 111–129 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kakkonen, T., Myller, N., Sutinen, E. (2006). Applying Latent Dirichlet Allocation to Automatic Essay Grading. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_13
Download citation
DOI: https://doi.org/10.1007/11816508_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)