skip to main content
10.1145/355214.355252acmconferencesArticle/Chapter ViewAbstractPublication PagesiralConference Proceedingsconference-collections
Article
Free Access

Text categorization using hybrid (mined) terms (poster session)

Published:01 November 2000Publication History

ABSTRACT

This paper evaluated text categorization using charactes, bigrams, words and hybrid terms. These terms were also augmented with mined terms. Classifiers using hybrid terms did not achieve better classification performance. The use of data mining techniques to add new terms to the dictionary improves the performance of character-based classifiers. Our naïve comparison between the Pat-tree classifier and our best classifier shows that the Pat-tree classifier has the best precision (77%) and our best classifier has the best recall (72%) and the lowest storage requirement (13%).

References

  1. 1.Lewis, D.D. (1992) "An evaluation of phrasal and clustered representations on a text categorization task", Proc. of 15th ACM SIGIR, pp.37--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Chen, C.L. and L.-F. Chien (1999) "PAT-tree based online corpus classification with an application to OCR text verification", 1RAL Workshop 1999.Google ScholarGoogle Scholar
  3. 3.Lam, W., C-Y Wong and K.F. Wong (1997) Performance Evaluation of Character-, Word- and N- Gram-Based Indexing for Chinese Text Retrieval, IRAL 97, Japan.Google ScholarGoogle Scholar
  4. 4.Tsang, T.F., R.W.P. Luk and K.F. Wong (1999) A Hybrid terms indexing strategy using words and bigrams, IRAL 99, Taiwan.Google ScholarGoogle Scholar
  5. 5.Van Rijsbergen, C.V. (1979) Information Retrieval, Butterworths, London. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Lin, Y.H. and A.K. Jain (1998) Classification of text documents, The Computer Journal, 41(8), 537--546.Google ScholarGoogle ScholarCross RefCross Ref
  7. 7.Fung, P. and D. Wu (1994) Statistical Augmentation of a Chinese Machine-readable dictionary, Proceedings of Workshop on Very Large Corpora, Kyoto, August.Google ScholarGoogle Scholar
  1. Text categorization using hybrid (mined) terms (poster session)

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages
          November 2000
          220 pages
          ISBN:1581133006
          DOI:10.1145/355214
          • Chairmen:
          • Kam-Fai Wong,
          • Dik L. Lee,
          • Jong-Hyeok Lee

          Copyright © 2000 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 November 2000

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader