CONDOCS: A Concept-Based Document Categorization System Using Concept-Probability Vector with Thesaurus

Kang, Hyun-Kyu; Lee, Jeong-Oog; Jeon, Heung Seok; Ko, Myeong-Cheol; Kim, Doo Hyun; Oh, Ryum-Duck; Kang, Wonseog

doi:10.1007/978-3-540-30583-5_72

CONDOCS: A Concept-Based Document Categorization System Using Concept-Probability Vector with Thesaurus

Hyun-Kyu Kang¹⁹,
Jeong-Oog Lee¹⁹,
Heung Seok Jeon¹⁹,
Myeong-Cheol Ko¹⁹,
Doo Hyun Kim²⁰,
Ryum-Duck Oh²¹ &
…
Wonseog Kang²²

Conference paper

1201 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3397))

Abstract

Traditional approaches in document categorization use the term-based classification techniques to classify the documents. The techniques, for enormous terms, are not effective to the applications that need speedy response or not much space. This paper presents an effective concept-based document categorization system, which can efficiently classify Korean documents through the thesaurus tool. The thesaurus tool is the information extractor that acquires the meanings of document terms from the thesaurus. It supports effective document categorization with the acquired meanings. The system uses the concept-probability vector to represent the meanings of the terms. Because the category of the document depends on the meanings than the terms, even though the size of the vector is small, the system can classify the document without degradation of the performance. The system uses the small concept-probability vector so that it can save the time and space for document categorization. The experimental results suggest that the presented system with the thesaurus tool can effectively classify the documents. The results show that even though the system uses the contracted vector for document categorization, the performance of the system is not degraded.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Linoff, M.D., Waltz, D.: Classifying News Stories using Memory Based Reasoning. In: Proc. Intl. Conf. on Research and Development in Information Retrieval, ACM SIGIR, pp. 59–65 (1992)
Google Scholar
Wong, K.M., Yao, Y.Y.: A Statistical Similarity Measure. In: Proc. Intl. Conf. on Research and Development in Information Retrieval, ACM SIGIR, pp. 3–12 (1987)
Google Scholar
Kweon, O.W.: Optimizing for Text Categorization Using Probability Vector and Meta Category, M.S. Thesis, KAIST Computer Science Dept. (1995)
Google Scholar
Hayes, J.: Intelligent High-Volume Text Processing Using Shallow, Domain-Specific Technique. In: Jacobs, P.S. (ed.) Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, Hillsdale, New Jersey, pp. 227–241 (1992)
Google Scholar
Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decision in Text Categorization and Retrieval. In: Proc. Intl. Conf. on Research and Development in Information Retrieval, ACM SIGIR, pp. 13–22 (1994)
Google Scholar
ETRI Natural Language Processing Lab.: ETRIKEMONG SET, ETRI (1997)
Google Scholar
Lee, H.A., Lee, J.H., Lee, G.B.: Concept-based Noun Phrase Indexing Method Using Syntactic Analysis and Cooccurence Information. In: Proc. of the 7th Hanguel and Korean Information Processing Conference (1996)
Google Scholar
EDR Technical Report: Concept Dictionary, Japan Electronic Dictionary Research Institute (1988)
Google Scholar
Kang, W.S.: Semantic Analysis of Prepositional Phrases in English-to-Korean Machine Translation, KAIST Ph.D. Thesis (1995)
Google Scholar
Kim, S.Y.: Morphological Analyzer using Tabular Parsing Method and Concatenation Information, KAIST M.S. Thesis (1987)
Google Scholar
Lewis, D.D.: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In: ACM SIGIR 1992 (1992)
Google Scholar
Apte, C., Famerau, F., Weiss, S.M.: Automated Learning of Decision Rules for Text Categorization. ACM Tr. on Information Systems 12(3) (1994)
Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to Word-Net: An On-line Lexical Database, Report of WordNet, Princeton University (1990)
Google Scholar
Sebstiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Yang, Y., Zhang, J., Kisiel, B.: A Scalability Analysis of Classifiers in Text Categorization. In: Proceedings of SIGIR 2003, 26th ACM International Conference, pp. 96–103 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Konkuk University, 322 Danwol-dong, Chungju-city, Chungbuk, 380-701, Korea
Hyun-Kyu Kang, Jeong-Oog Lee, Heung Seok Jeon & Myeong-Cheol Ko
Dept. of Internet & Multimedia, Konkuk University, 1 Hwayang-dong, Gwangjin-gu, Seoul, 143-701, Korea
Doo Hyun Kim
Dept. of Computer Science, Chungju National University, 123 Goemdan-ri, Eryu-meon, Chungju-city, Chungbuk, 380-702, Korea
Ryum-Duck Oh
Dept. of Computer Engineering Education, Andong National University, 388 SongChun Dong, Andong, Kyungpook, 760-749, Korea
Wonseog Kang

Authors

Hyun-Kyu Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jeong-Oog Lee
View author publications
You can also search for this author in PubMed Google Scholar
Heung Seok Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Myeong-Cheol Ko
View author publications
You can also search for this author in PubMed Google Scholar
Doo Hyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ryum-Duck Oh
View author publications
You can also search for this author in PubMed Google Scholar
Wonseog Kang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of EECS, Korea Advanced Institute of Science and Technology (KAIST), 373-1 Guseong-dong, Yuseong-gu, Daejeon, Republic of Korea
Tag Gon Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, HK. et al. (2005). CONDOCS: A Concept-Based Document Categorization System Using Concept-Probability Vector with Thesaurus. In: Kim, T.G. (eds) Artificial Intelligence and Simulation. AIS 2004. Lecture Notes in Computer Science(), vol 3397. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30583-5_72

Download citation

DOI: https://doi.org/10.1007/978-3-540-30583-5_72
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24476-9
Online ISBN: 978-3-540-30583-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics