Abstract
This paper describes a supervised three-tier clustering method for classifying students’ essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students’ essay statements into principles and misconceptions of physics. A simple ‘bag-of-words’ representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students’ essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student’s essay.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Maron, M.: Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery 8(3), 404–417 (1961)
Duda, R., Hart, P.: Pattern Classification and Scene Analysis, pp. 95–99. John Wiley & Sons, Chichester (1973)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Slonim, N., Tishby, N.: The power of word clusters for text classification. In: Proceedings of ECIR 2001, 23rd European Colloquium on Information Retrieval Research Darmstadt, Germany (2001)
Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of SIGIR 2002, 25th ACM intermational Conference on Research and Development of Information Retireval, Tampere, Finland, ACM Press, New York (2002)
El-Yaniv, R., Souroujon, O.: Iterative Double Clustering for Unsupervised and Semi-supervised Learning. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 121–132. Springer, Heidelberg (2001)
Periera, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: 31st Annual Meeting of the ACL, pp. 183–190 (1993)
Van Lehn, K., Jordan, P., Rosé, C., Bhembe, D., Böttner, M., Gaydos, A., Makatchev, M., Pappuswamy, U., Ringenberg, M., Roque, A., Siler, S., Srivastava, R.: The architecture of why2-atlas: A coach for qualitative physics essay writing. In: Cerri, S.A., Gouardéres, G., Paraguaçu, F. (eds.) ITS 2002. LNCS, vol. 2363, pp. 158–167. Springer, Heidelberg (2002)
Chi, M., de Leeuw, N., Chiu, M.-H., LaVancher, C.: Eliciting self explanations improves understanding. Cognitive Science 18, 439–477 (1994)
Slotta, J., Chi, M.T.H., Joram, E.: Assessing students’ misclassifications of physics concepts: An ontological basis for conceptual change. Cognition and Instruction 13(3), 373–400 (1995)
Graesser, A.C., Wiemer-Hastings, P., Wiemer-Hastings, K., Harter, D., Person, N., The Tutoring Research Group: Using Latent Semantic Analysis to Evaluate the Contributions of Students in AUTOTUTOR. Interactive Learning Environments 8, 129–148 (2000)
Rosé, C.P., Roque, A., Bhembe, D., VanLehn, K.: A Hybrid Text Classification Approach for Analysis of Student Essays. In: Proceedings of the Human Language Technology conference/ North American chapter of the Association for Computational Linguistics annual meeting. Workshop on Educational Applications of Natural Language Processing (2003)
Hotho, A., Staab, S., Stumme, G.: Text Clustering Based on Background Knowledge. Institute of Applied Informatics and Formal Description Methods AIFB, Technical Report No. 425 (2003)
Baker, L.D., McCallum, A.K.: Distributional Clustering of Words for Text Classification. In: ACM SIGIR 1998 (1998)
Fix, E., Hodges, J.L.: Discriminatory Analysis – Nonparametric Discrimination: Consistency Properties, Project 21–49–004, Report No. 4, USAF School of Aviation Medicine, Randolf Field, TX, pp. 261–279 (1951)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pappuswamy, U., Bhembe, D., Jordan, P.W., VanLehn, K. (2005). A Supervised Clustering Method for Text Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_78
Download citation
DOI: https://doi.org/10.1007/978-3-540-30586-6_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)