Article

Free Access

Text categorization using hybrid (mined) terms (poster session)

Authors:
C. K. P. Wong

Chinese University of Hong Kong, Dept. of Systems Eng. And Eng., Management, Shartin, Hong Kong

Chinese University of Hong Kong, Dept. of Systems Eng. And Eng., Management, Shartin, Hong Kong
View Profile

,
R. W. P. Luk

View Profile

,
K. F. Wong

View Profile

,
K. L. Kwok

Queens' College, CUNY, Dept. Computer Science, New York

Queens' College, CUNY, Dept. Computer Science, New York
View Profile

IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languagesNovember 2000Pages 217–218https://doi.org/10.1145/355214.355252

Published:01 November 2000Publication History

IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

Pages 217–218

ABSTRACT

This paper evaluated text categorization using charactes, bigrams, words and hybrid terms. These terms were also augmented with mined terms. Classifiers using hybrid terms did not achieve better classification performance. The use of data mining techniques to add new terms to the dictionary improves the performance of character-based classifiers. Our naïve comparison between the Pat-tree classifier and our best classifier shows that the Pat-tree classifier has the best precision (77%) and our best classifier has the best recall (72%) and the lowest storage requirement (13%).

References

1.Lewis, D.D. (1992) "An evaluation of phrasal and clustered representations on a text categorization task", Proc. of 15th ACM SIGIR, pp.37--50. Google ScholarDigital Library
2.Chen, C.L. and L.-F. Chien (1999) "PAT-tree based online corpus classification with an application to OCR text verification", 1RAL Workshop 1999.Google Scholar
3.Lam, W., C-Y Wong and K.F. Wong (1997) Performance Evaluation of Character-, Word- and N- Gram-Based Indexing for Chinese Text Retrieval, IRAL 97, Japan.Google Scholar
4.Tsang, T.F., R.W.P. Luk and K.F. Wong (1999) A Hybrid terms indexing strategy using words and bigrams, IRAL 99, Taiwan.Google Scholar
5.Van Rijsbergen, C.V. (1979) Information Retrieval, Butterworths, London. Google ScholarDigital Library
6.Lin, Y.H. and A.K. Jain (1998) Classification of text documents, The Computer Journal, 41(8), 537--546.Google ScholarCross Ref
7.Fung, P. and D. Wu (1994) Statistical Augmentation of a Chinese Machine-readable dictionary, Proceedings of Workshop on Very Large Corpora, Kyoto, August.Google Scholar

Text categorization using hybrid (mined) terms (poster session)
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Effect of term distributions on centroid-based text categorization
Special issue: Informatics and computer science intelligent systems applications

Most of traditional text categorization approaches utilize term frequency (tf) and inverse document frequency (idf) for representing importance of words and/or terms in classifying a text document. This paper describes an approach to apply term ...
Read More
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More
Best terms: an efficient feature-selection algorithm for text categorization

In this paper, we propose a new feature-selection algorithm for text classification, called best terms (BT). The complexity of BT is linear in respect to the number of the training-set documents and is independent from both the vocabulary size and the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages
November 2000
220 pages
ISBN:1581133006
DOI:10.1145/355214
Chairmen:
Kam-Fai Wong
Chinese Univ. of Hong Kong, Hong Kong, China
,
Dik L. Lee
Hong Kong Univ. of Science and Technology, Hong Kong, China
,
Jong-Hyeok Lee
Pohang Univ. of Science and Technology, Korea
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data mining
evaluation
text categorization
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 272
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Text categorization using hybrid (mined) terms (poster session)

IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

ABSTRACT

References

Cited By

Recommendations

Effect of term distributions on centroid-based text categorization

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Best terms: an efficient feature-selection algorithm for text categorization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Text categorization using hybrid (mined) terms (poster session)

IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

ABSTRACT

References

Cited By

Recommendations

Effect of term distributions on centroid-based text categorization

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Best terms: an efficient feature-selection algorithm for text categorization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media