Skip to main content

A Supervised Clustering Method for Text Classification

  • Conference paper
Book cover Computational Linguistics and Intelligent Text Processing (CICLing 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3406))

Abstract

This paper describes a supervised three-tier clustering method for classifying students’ essays of qualitative physics in the Why2-Atlas tutoring system. Our main purpose of categorizing text in our tutoring system is to map the students’ essay statements into principles and misconceptions of physics. A simple ‘bag-of-words’ representation using a naïve-bayes algorithm to categorize text was unsatisfactory for our purposes of analyses as it exhibited many misclassifications because of the relatedness of the concepts themselves and its inability to handle misconceptions. Hence, we investigate the performance of the k-nearest neighborhood algorithm coupled with clusters of physics concepts on classifying students’ essays. We use a three-tier tagging schemata (cluster, sub-cluster and class) for each document and found that this kind of supervised hierarchical clustering leads to a better understanding of the student’s essay.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Maron, M.: Automatic indexing: an experimental inquiry. Journal of the Association for Computing Machinery 8(3), 404–417 (1961)

    MATH  Google Scholar 

  2. Duda, R., Hart, P.: Pattern Classification and Scene Analysis, pp. 95–99. John Wiley & Sons, Chichester (1973)

    MATH  Google Scholar 

  3. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  4. Slonim, N., Tishby, N.: The power of word clusters for text classification. In: Proceedings of ECIR 2001, 23rd European Colloquium on Information Retrieval Research Darmstadt, Germany (2001)

    Google Scholar 

  5. Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proceedings of SIGIR 2002, 25th ACM intermational Conference on Research and Development of Information Retireval, Tampere, Finland, ACM Press, New York (2002)

    Google Scholar 

  6. El-Yaniv, R., Souroujon, O.: Iterative Double Clustering for Unsupervised and Semi-supervised Learning. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 121–132. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Periera, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: 31st Annual Meeting of the ACL, pp. 183–190 (1993)

    Google Scholar 

  8. Van Lehn, K., Jordan, P., Rosé, C., Bhembe, D., Böttner, M., Gaydos, A., Makatchev, M., Pappuswamy, U., Ringenberg, M., Roque, A., Siler, S., Srivastava, R.: The architecture of why2-atlas: A coach for qualitative physics essay writing. In: Cerri, S.A., Gouardéres, G., Paraguaçu, F. (eds.) ITS 2002. LNCS, vol. 2363, pp. 158–167. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  9. Chi, M., de Leeuw, N., Chiu, M.-H., LaVancher, C.: Eliciting self explanations improves understanding. Cognitive Science 18, 439–477 (1994)

    Google Scholar 

  10. Slotta, J., Chi, M.T.H., Joram, E.: Assessing students’ misclassifications of physics concepts: An ontological basis for conceptual change. Cognition and Instruction 13(3), 373–400 (1995)

    Article  Google Scholar 

  11. Graesser, A.C., Wiemer-Hastings, P., Wiemer-Hastings, K., Harter, D., Person, N., The Tutoring Research Group: Using Latent Semantic Analysis to Evaluate the Contributions of Students in AUTOTUTOR. Interactive Learning Environments 8, 129–148 (2000)

    Article  Google Scholar 

  12. Rosé, C.P., Roque, A., Bhembe, D., VanLehn, K.: A Hybrid Text Classification Approach for Analysis of Student Essays. In: Proceedings of the Human Language Technology conference/ North American chapter of the Association for Computational Linguistics annual meeting. Workshop on Educational Applications of Natural Language Processing (2003)

    Google Scholar 

  13. Hotho, A., Staab, S., Stumme, G.: Text Clustering Based on Background Knowledge. Institute of Applied Informatics and Formal Description Methods AIFB, Technical Report No. 425 (2003)

    Google Scholar 

  14. Baker, L.D., McCallum, A.K.: Distributional Clustering of Words for Text Classification. In: ACM SIGIR 1998 (1998)

    Google Scholar 

  15. Fix, E., Hodges, J.L.: Discriminatory Analysis – Nonparametric Discrimination: Consistency Properties, Project 21–49–004, Report No. 4, USAF School of Aviation Medicine, Randolf Field, TX, pp. 261–279 (1951)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pappuswamy, U., Bhembe, D., Jordan, P.W., VanLehn, K. (2005). A Supervised Clustering Method for Text Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2005. Lecture Notes in Computer Science, vol 3406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30586-6_78

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30586-6_78

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24523-0

  • Online ISBN: 978-3-540-30586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics