Abstract
In this article, a neural network document classifier with linguistic feature selection and multi-category output is presented. It consists of a feature selection unit and a hierarchical neural network classification unit. In feature selection unit, we extract terms from some original documents by text processing, and then we analyze the conformity and uniformity of each term by entropy function which is characterized to measure the significance of term. Terms with high significance will be selected as input features for neural network document classifiers. In order to reduce the input dimension, we perform a mechanism to merge synonyms. According to the uniformity analysis, we obtain a term similarity matrix by fuzzy relation operation. By this method, we can construct a synonym thesaurus to reduce input dimension. In the hierarchical neural network classification unit, we adopt the well-known back-propagation learning model to build some proper hierarchical classification units. In our experiments, a product description database from an electronic commercial company is employed. The experimental results show that this classifier achieves sufficient accuracy to help human classification. It can save much manpower and working time for classifying a large database.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley (1989)
Yun-Long Huang: A Theoretic and Empirical Research of Cluster Indexing for Mandarin Chinese Full Text Document. The Journal of Library and Information Science. 24 (1998) 1023–2125 (in Chinese)
Rumelhart, D. E., Hinton, G. E., and William R. J.: Learning Internal Representation by Error Propagation. Parallel Distributed Processing. Vol.1. MIT Press (1986)
Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development. Vol.1, No.4. (1957)
Porter, M. E.: Competitive Strategy: Techniques for Analyzing Industries and Competitors. New York: Free Press (1980)
Francis, W., and Kucera, H.: Frequency Analysis of English Usage. New York (1982)
Zadeh, L. A.: Towards a Theory of Fuzzy Systems. Aspects of Networks and Systems Theory. New York. (1971) 469–490
William B. Frakes, Ricardo Baeza-Yates: Information Retrieval: Data Structures & Algorithms. Prentice Hall PTR (1992)
George J. Klir, Bo Yuan: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall PTR (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, HM., Chen, CM., Hwang, CW. (2000). A Neural Network Document Classifier with Linguistic Feature Selection. In: Logananthara, R., Palm, G., Ali, M. (eds) Intelligent Problem Solving. Methodologies and Approaches. IEA/AIE 2000. Lecture Notes in Computer Science(), vol 1821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45049-1_66
Download citation
DOI: https://doi.org/10.1007/3-540-45049-1_66
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67689-8
Online ISBN: 978-3-540-45049-8
eBook Packages: Springer Book Archive