A Supervised Clustering Method for Text Classification

Pappuswamy, Umarani; Bhembe, Dumisizwe; Jordan, Pamela W.; VanLehn, Kurt

doi:10.1007/978-3-540-30586-6_78

Umarani Pappuswamy¹⁷,
Dumisizwe Bhembe¹⁷,
Pamela W. Jordan¹⁷ &
…
Kurt VanLehn¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

2243 Accesses
1 Citations

Abstract

This paper describes a supervised three-tier clustering method for classifying students’ essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students’ essay statements into principles and misconceptions of physics. A simple ‘bag-of-words’ representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students’ essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student’s essay.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Maron, M.: Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery 8(3), 404–417 (1961)
MATH Google Scholar
Duda, R., Hart, P.: Pattern Classification and Scene Analysis, pp. 95–99. John Wiley & Sons, Chichester (1973)
MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Slonim, N., Tishby, N.: The power of word clusters for text classification. In: Proceedings of ECIR 2001, 23rd European Colloquium on Information Retrieval Research Darmstadt, Germany (2001)
Google Scholar
Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of SIGIR 2002, 25th ACM intermational Conference on Research and Development of Information Retireval, Tampere, Finland, ACM Press, New York (2002)
Google Scholar
El-Yaniv, R., Souroujon, O.: Iterative Double Clustering for Unsupervised and Semi-supervised Learning. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 121–132. Springer, Heidelberg (2001)
Chapter Google Scholar
Periera, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: 31st Annual Meeting of the ACL, pp. 183–190 (1993)
Google Scholar
Van Lehn, K., Jordan, P., Rosé, C., Bhembe, D., Böttner, M., Gaydos, A., Makatchev, M., Pappuswamy, U., Ringenberg, M., Roque, A., Siler, S., Srivastava, R.: The architecture of why2-atlas: A coach for qualitative physics essay writing. In: Cerri, S.A., Gouardéres, G., Paraguaçu, F. (eds.) ITS 2002. LNCS, vol. 2363, pp. 158–167. Springer, Heidelberg (2002)
Chapter Google Scholar
Chi, M., de Leeuw, N., Chiu, M.-H., LaVancher, C.: Eliciting self explanations improves understanding. Cognitive Science 18, 439–477 (1994)
Google Scholar
Slotta, J., Chi, M.T.H., Joram, E.: Assessing students’ misclassifications of physics concepts: An ontological basis for conceptual change. Cognition and Instruction 13(3), 373–400 (1995)
Article Google Scholar
Graesser, A.C., Wiemer-Hastings, P., Wiemer-Hastings, K., Harter, D., Person, N., The Tutoring Research Group: Using Latent Semantic Analysis to Evaluate the Contributions of Students in AUTOTUTOR. Interactive Learning Environments 8, 129–148 (2000)
Article Google Scholar
Rosé, C.P., Roque, A., Bhembe, D., VanLehn, K.: A Hybrid Text Classification Approach for Analysis of Student Essays. In: Proceedings of the Human Language Technology conference/ North American chapter of the Association for Computational Linguistics annual meeting. Workshop on Educational Applications of Natural Language Processing (2003)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Text Clustering Based on Background Knowledge. Institute of Applied Informatics and Formal Description Methods AIFB, Technical Report No. 425 (2003)
Google Scholar
Baker, L.D., McCallum, A.K.: Distributional Clustering of Words for Text Classification. In: ACM SIGIR 1998 (1998)
Google Scholar
Fix, E., Hodges, J.L.: Discriminatory Analysis – Nonparametric Discrimination: Consistency Properties, Project 21–49–004, Report No. 4, USAF School of Aviation Medicine, Randolf Field, TX, pp. 261–279 (1951)
Google Scholar

Download references

Author information

Authors and Affiliations

Learning Research and Development Center, University of Pittsburgh, 3939 0’Hara Street, Pittsburgh, PA, 15260, USA
Umarani Pappuswamy, Dumisizwe Bhembe, Pamela W. Jordan & Kurt VanLehn

Authors

Umarani Pappuswamy
View author publications
You can also search for this author in PubMed Google Scholar
Dumisizwe Bhembe
View author publications
You can also search for this author in PubMed Google Scholar
Pamela W. Jordan
View author publications
You can also search for this author in PubMed Google Scholar
Kurt VanLehn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, México
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pappuswamy, U., Bhembe, D., Jordan, P.W., VanLehn, K. (2005). A Supervised Clustering Method for Text Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_78

Download citation

DOI: https://doi.org/10.1007/978-3-540-30586-6_78
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24523-0
Online ISBN: 978-3-540-30586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics