Skip to main content

A Neural Network Document Classifier with Linguistic Feature Selection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1821))

Abstract

In this article, a neural network document classifier with linguistic feature selection and multi-category output is presented. It consists of a feature selection unit and a hierarchical neural network classification unit. In feature selection unit, we extract terms from some original documents by text processing, and then we analyze the conformity and uniformity of each term by entropy function which is characterized to measure the significance of term. Terms with high significance will be selected as input features for neural network document classifiers. In order to reduce the input dimension, we perform a mechanism to merge synonyms. According to the uniformity analysis, we obtain a term similarity matrix by fuzzy relation operation. By this method, we can construct a synonym thesaurus to reduce input dimension. In the hierarchical neural network classification unit, we adopt the well-known back-propagation learning model to build some proper hierarchical classification units. In our experiments, a product description database from an electronic commercial company is employed. The experimental results show that this classifier achieves sufficient accuracy to help human classification. It can save much manpower and working time for classifying a large database.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley (1989)

    Google Scholar 

  2. Yun-Long Huang: A Theoretic and Empirical Research of Cluster Indexing for Mandarin Chinese Full Text Document. The Journal of Library and Information Science. 24 (1998) 1023–2125 (in Chinese)

    Google Scholar 

  3. Rumelhart, D. E., Hinton, G. E., and William R. J.: Learning Internal Representation by Error Propagation. Parallel Distributed Processing. Vol.1. MIT Press (1986)

    Google Scholar 

  4. Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development. Vol.1, No.4. (1957)

    Google Scholar 

  5. Porter, M. E.: Competitive Strategy: Techniques for Analyzing Industries and Competitors. New York: Free Press (1980)

    Google Scholar 

  6. Francis, W., and Kucera, H.: Frequency Analysis of English Usage. New York (1982)

    Google Scholar 

  7. Zadeh, L. A.: Towards a Theory of Fuzzy Systems. Aspects of Networks and Systems Theory. New York. (1971) 469–490

    Google Scholar 

  8. William B. Frakes, Ricardo Baeza-Yates: Information Retrieval: Data Structures & Algorithms. Prentice Hall PTR (1992)

    Google Scholar 

  9. George J. Klir, Bo Yuan: Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall PTR (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, HM., Chen, CM., Hwang, CW. (2000). A Neural Network Document Classifier with Linguistic Feature Selection. In: Logananthara, R., Palm, G., Ali, M. (eds) Intelligent Problem Solving. Methodologies and Approaches. IEA/AIE 2000. Lecture Notes in Computer Science(), vol 1821. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45049-1_66

Download citation

  • DOI: https://doi.org/10.1007/3-540-45049-1_66

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67689-8

  • Online ISBN: 978-3-540-45049-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics