Skip to main content

An Effective Document Classification System Based on Concept Probability Vector

  • Conference paper
Book cover Content Computing (AWCC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3309))

Included in the following conference series:

Abstract

This paper presents an effective concept-based document classification system, which can efficiently classify Korean documents through the thesaurus tool. The thesaurus tool is the information extractor that acquires the meanings of document terms from the thesaurus. It supports effective document classification with the acquired meanings. The system uses the concept-probability vector to represent the meanings of the terms. Because the category of the document depends on the meanings than the terms, even though the size of the vector is small, the system can classify the document without degradation of the performance. The system uses the small concept-probability vector so that it can save the time and space for document classification. The experimental results suggest that the presented system with the thesaurus tool can effectively classify the documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wong, K.M., Yao, Y.Y.: A Statistical Similarity Measure. In: Proc. Intl. Conf. on Research and Development in Information Retrieval, ACM SIGIR, pp. 3–12 (1987)

    Google Scholar 

  2. ETRI Natural Language Processing Lab.: ETRIKEMONG SET, ETRI (1997)

    Google Scholar 

  3. EDR Technical Report: Concept Dictionary, Japan Electronic Dictionary Research Institute (1988)

    Google Scholar 

  4. Kang, W.S.: Semantic Analysis of Prepositional Phrases in English-to-Korean Machine Translation, KAIST Ph.D. Thesis (1995)

    Google Scholar 

  5. Lewis, D.D.: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In: ACM SIGIR 1992 (1992)

    Google Scholar 

  6. Apte, C., Famerau, F., Weiss, S.M.: Automated Learning of Decision Rules for Text Categorization. ACM Tr. on Information Systems 12(3) (1994)

    Google Scholar 

  7. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to Word-Net: An On-line Lexical Database, Report of WordNet, Princeton University (1990)

    Google Scholar 

  8. Sebstiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  9. Yang, Y., Zhang, J., Kisiel, B.: A Scalability Analysis of Classifiers in Text Categorization. In: Proceedings of SIGIR 2003, 26th ACM International Conference, pp. 96–103 (2003)

    Google Scholar 

  10. Linoff, M.D., Waltz, D.: Classifying News Stories using Memory Based Reasoning. In: Proc. Intl. Conf. on Research and Development in Information Retrieval, ACM SIGIR, pp. 59–65 (1992)

    Google Scholar 

  11. Hayes, J.: Intelligent High-Volume Text Processing Using Shallow, Domain-Specific Technique. In: Jacobs, P.S. (ed.) Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, Hillsdale, New Jersey, pp. 227–241 (1992)

    Google Scholar 

  12. Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decision in Text Categorization and Retrieval. In: Proc. Intl. Conf. on Research and Development in Information Retrieval, ACM SIGIR, pp. 13–22 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kang, HK., Hwang, YG., Ryu, PM. (2004). An Effective Document Classification System Based on Concept Probability Vector. In: Chi, CH., Lam, KY. (eds) Content Computing. AWCC 2004. Lecture Notes in Computer Science, vol 3309. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30483-8_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30483-8_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23898-0

  • Online ISBN: 978-3-540-30483-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics