A Document Classification Algorithm Using the Fuzzy Set Theory and Hierarchical Structure of Document

Han, Seok-Woo; Eun, Hye-Jue; Kim, Yong-Sung; Kóczy, László T.

doi:10.1007/978-3-540-24707-4_16

Seok-Woo Han²⁰,
Hye-Jue Eun²¹,
Yong-Sung Kim²¹ &
…
László T. Kóczy²²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3043))

Included in the following conference series:

International Conference on Computational Science and Its Applications

573 Accesses
1 Citations

Abstract

In present, Information retrieval systems which are simply expressed with combination between keywords and phrase search according to the direct keyword matching method to get the information which users need. But Web documents retrieval systems serve too many documents because of term ambiguity. Also it often happens that words with several meanings occur in a document, but in a rather different context from that expected by the querying person. So the user should need extra time and effort to get more close documents. To overcome these problems, in this paper we propose an information retrieval system based on the content, which connects documents according to the degree of semantic link which it express fuzzy value by fuzzy function. Also we propose an algorithm which it produce the hierarchical structure using the degree of concepts and contents among documents. As result, we are able to select and to provide user-interested documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-ates, R., Ribeiro-Neto, B.: Modern Information Retrieval, pp. 230–255 (1998)
Google Scholar
Wallis, P., Tom, J.A.: Relevance judgements for assessing recall. Information Processing and Management 32, 273–286 (1998)
Article Google Scholar
Klir, G.J., Yuan, B.:Fuzzy Sets and Fuzzy Logic Theory and Applications (1998)
Google Scholar
Koczy, L.T.:Information retrieval by fuzzy relations and hierarchical co-occurrence (1997)
Google Scholar
Baranyi, P., Gedeon, T.D., Koczy, L.T.:Improved fuzzy and neural network algorithms for frequency prediction in document filtering. TR 97-02 (1997)
Google Scholar
Koczy, L.T., Gedeon, T.D., Koczy, J.A.: The construction of fuzzy relational maps in information retrieval. IETR 98-01 (1998)
Google Scholar
Koczy, L.T., Gedeon, T.: Information retrieval by fuzzy relations and hierarchical cooccurrence, Part I. TR99-01, Dept. of Info. Eng., School of Comp. Sci. & Eng. UNSW (1999)
Google Scholar
Eun, Hye-jue: An Algorithm of Documents classification and Query Extension using fuzzy function. Journal of KISS: Software and applications 28(2) (2001)
Google Scholar
Blosseville, M., Hebrail, G., Monteil, M., Penot, N.: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together. In: SIGIR (1999)
Google Scholar
Jacobs, P.: Using statistical methods to improve knowledge-based news categorization. IEEE Expert (2000)
Google Scholar
Hoch, R.: Using Information Retrieval techniques for text classification in document analysis. In: SIGIR (1999)
Google Scholar
Guha, S.: A Robust Clustering Algorithm for categorical Attributes. Information Systems 25(5), 345–366 (2000)
Article MathSciNet Google Scholar
Oard, D.W.: Support for interactive document selection in cross language information retrieval. Information Processing and Management 35 (1999)
Google Scholar
Boley, D.: Document Categorization and Query Generation on the World Wide Web using WebACE. Artificial Intellignece Review 13, 365–391 (1999)
Article Google Scholar
Joachims, T.: Text Categorization with vector support machine : learning with many relevant features. Technical report 23, University of Dortsmund, LS VIII (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Application and Development, Wonkwang Health Science College, iksan, 561-756, Korea
Seok-Woo Han
Dept. of Computer Information College of Engineering, Chonbuk National University, Chonju, 561-756, Korea
Hye-Jue Eun & Yong-Sung Kim
Dept. of Telecommunication and Telematics, Technical University of Budapest, Budapest, H-1521, Hungary
László T. Kóczy

Authors

Seok-Woo Han
View author publications
You can also search for this author in PubMed Google Scholar
Hye-Jue Eun
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Sung Kim
View author publications
You can also search for this author in PubMed Google Scholar
László T. Kóczy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemistry, University of Perugia, Via Elce di Sotto, 8, I-06123, Perugia, Italy
Antonio Laganá
Department of Computer Science, University of Calgary, 2500 University Drive N.W., T2N 1N4, Calgary, AB, Canada
Marina L. Gavrilova
William Norris Professor, Head of the Computer Science and Engineering Department, University of Minnesota, USA
Vipin Kumar
School of Computing, Soongsil University, Seoul, Korea
Youngsong Mun
OptimaNumerics Ltd., Cathedral House, 23-31 Waring Street, BT1 2DX, Belfast, UK
C. J. Kenneth Tan
Department of Mathematics and Computer Science, University of Perugia, via Vanvitelli, 1, I-06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, SW., Eun, HJ., Kim, YS., Kóczy, L.T. (2004). A Document Classification Algorithm Using the Fuzzy Set Theory and Hierarchical Structure of Document. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds) Computational Science and Its Applications – ICCSA 2004. ICCSA 2004. Lecture Notes in Computer Science, vol 3043. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24707-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-24707-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22054-1
Online ISBN: 978-3-540-24707-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics