A Neural Network Document Classifier with Linguistic Feature Selection

Lee, Hahn-Ming; Chen, Chih-Ming; Hwang, Cheng-Wei

doi:10.1007/3-540-45049-1_66

Hahn-Ming Lee⁴,
Chih-Ming Chen⁴ &
Cheng-Wei Hwang⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1821))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

5561 Accesses

Abstract

In this article, a neural network document classifier with linguistic feature selection and multi-category output is presented. It consists of a feature selection unit and a hierarchical neural network classification unit. In feature selection unit, we extract terms from some original documents by text processing, and then we analyze the conformity and uniformity of each term by entropy function which is characterized to measure the significance of term. Terms with high significance will be selected as input features for neural network document classifiers. In order to reduce the input dimension, we perform a mechanism to merge synonyms. According to the uniformity analysis, we obtain a term similarity matrix by fuzzy relation operation. By this method, we can construct a synonym thesaurus to reduce input dimension. In the hierarchical neural network classification unit, we adopt the well-known back-propagation learning model to build some proper hierarchical classification units. In our experiments, a product description database from an electronic commercial company is employed. The experimental results show that this classifier achieves sufficient accuracy to help human classification. It can save much manpower and working time for classifying a large database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Text Classification Using Neural Network and Statistical Approaches

Automatic Text Document Classification by Using Semantic Analysis and Lion Optimization Algorithm

A Probabilistic Vector Representation and Neural Network for Text Classification

References

Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley (1989)
Google Scholar
Yun-Long Huang: A Theoretic and Empirical Research of Cluster Indexing for Mandarin Chinese Full Text Document. The Journal of Library and Information Science. 24 (1998) 1023–2125 (in Chinese)
Google Scholar
Rumelhart, D. E., Hinton, G. E., and William R. J.: Learning Internal Representation by Error Propagation. Parallel Distributed Processing. Vol.1. MIT Press (1986)
Google Scholar
Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development. Vol.1, No.4. (1957)
Google Scholar
Porter, M. E.: Competitive Strategy: Techniques for Analyzing Industries and Competitors. New York: Free Press (1980)
Google Scholar
Francis, W., and Kucera, H.: Frequency Analysis of English Usage. New York (1982)
Google Scholar
Zadeh, L. A.: Towards a Theory of Fuzzy Systems. Aspects of Networks and Systems Theory. New York. (1971) 469–490
Google Scholar
William B. Frakes, Ricardo Baeza-Yates: Information Retrieval: Data Structures & Algorithms. Prentice Hall PTR (1992)
Google Scholar
George J. Klir, Bo Yuan: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall PTR (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
Hahn-Ming Lee, Chih-Ming Chen & Cheng-Wei Hwang

Authors

Hahn-Ming Lee
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Ming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Wei Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Center for Advanced Computer Studies, University of Lousiana, 2 Rex Street, Lafayette, LA, 70504-4330, USA
Rasiah Logananthara
Department of Neural Information Processing, University of Ulm, Oberer Eselsberg, 89069, Ulm, Germany
Günther Palm
Department of Computer Science, Southwest Texas State University, 601 University Drive, San Marcos, TX, 78666-4616, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, HM., Chen, CM., Hwang, CW. (2000). A Neural Network Document Classifier with Linguistic Feature Selection. In: Logananthara, R., Palm, G., Ali, M. (eds) Intelligent Problem Solving. Methodologies and Approaches. IEA/AIE 2000. Lecture Notes in Computer Science(), vol 1821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45049-1_66

Download citation

DOI: https://doi.org/10.1007/3-540-45049-1_66
Published: 18 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67689-8
Online ISBN: 978-3-540-45049-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics