A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection

Chen, Chih-Ming; Lee, Hahn-Ming; Hwang, Cheng-Wei

doi:10.1007/s10489-005-4613-0

A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection

Published: December 2005

Volume 23, pages 277–294, (2005)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chih-Ming Chen¹,
Hahn-Ming Lee² &
Cheng-Wei Hwang²

128 Accesses
18 Citations
Explore all metrics

Abstract

In this article, a neural network document classifier with linguistic feature selection and multi-category output is presented. It consists of a feature selection unit and a hierarchical neural network classification unit. In the feature selection unit, the candidate terms are extracted from some original documents by text processing techniques, and then the conformity and uniformity of each term are analyzed by an entropy function which can measure the significance of terms. Terms with high significance are selected as input features for training neural network document classifiers. In order to reduce the input dimensions, a composition mechanism of fuzzy relation is employed to identify synonyms. By this method, a synonym thesaurus can be constructed to reduce input dimensions. To simplify the learning scheme, the well-known back-propagation learning model is used to build proper hierarchical classification units. In our experiments, a product description database from an electronic commercial company is employed. The experimental results show that this classifier achieves sufficient accuracy to help human classification. It can save much manpower and work time classifying a large database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

R.E. Filman and Sangam Pant, “Searching the Internet,” IEEE Internet Computing, July–August, pp. 21–23, 1998.
A. Sun, Ee-Peng Lim, “Hierarchical text classification and evaluation,” in Proceedings IEEE International Conference on Data Mining, 2001, pp. 521–528.
M.E. Ruiz and P. Srinivasan, “Hierarchical text categorization using neural networks,” Information Retrieval, vol. 5, no. 1, pp. 87–118, 2002.
Article MATH Google Scholar
M. Sasaki and K. Kita, “Rule-based text categorization using hierarchical categories,” IEEE International Conference on Systems, Man, and Cybernetics, 1998, vol. 3, pp. 2827– 2830.
R. Schettin, C. Brambilla, G. Ciocca, A. Valsasna, and M. De Ponti, “A hierarchical classification strategy for digital documents,” Pattern Recognition, vol. 35, pp. 1759–1769, 2002.
Google Scholar
C.H. Caldas and L. Soibelman, “Automating hierarchical document classification for construction management information systems,” Automation in Construction, vol. 12, pp. 395–406, 2003.
Google Scholar
G.P. Zhang, “Neural networks for classification: A survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 30, no. 4, pp. 451–462, 2000.
Google Scholar
P. Picton, Neural Networks, Palgrave: New York, 2000.
Google Scholar
Hsinchun Chen, Chris Schuffels, and Richard Orwig, “Internet categorization and search: A self-organizing approach,” Journal of Visual Communication and Image Representation, vol. 7, no. 1, pp. 88–102, 1996.
Google Scholar
W.B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures & Algorithms, Prentice Hall PTR, 1992.
C. Jenkins, M. Jackson, P. Burden, and J. Wallis, “Searching the World Wide Web: An evaluation of available tools and methodologies,” Information and Software Technology, vol. 39, pp. 985–994, 1998.
V.N. Gudivada, V.V. Raghavan, W.I. Grosky, and R. Kasanagottu, “Information retrieval on the World Wide Web,” IEEE Internet Computing, September–October, pp. 58–68, 1997.
Ron Sun and L.A. Bookman, Computational Architectures Integrating Neural and Symbolic Processes: A Perspective of the State of the Art, Kluwer Academic Publishers, 1995.
Stefan Wermter and Ron Sun, Hybrid Neural Systems, Springer-Verlag Telos, 2000.
G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison Wesley, 1989.
M.F. Wyle and H.P. Frei, “Retrieval algorithm effectiveness in a Wide Area network information filter,” in Proc. of the 14th ACM SIGIR Conf. on R&D in Information Retrieval, ACM, Chicago IL, 1991, pp. 114–122.
M.E. Porter, Competitive Strategy: Techniques for Analyzing Industries and Competitors, Free Press: New York, 1980.
Google Scholar
W. Francis and H. Kucera, Frequency Analysis of English Usage, New York, 1982.
C.E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 279–423, 1948.
MathSciNet Google Scholar
Y. Yang, C.G. Chute, and M. Clinic, “An example-based mapping method for text categorization and retrieval,” ACM Transaction on Information Systems, vol. 12, no. 3, pp. 252–277, 1994.
Article Google Scholar
M.T. Hagan, B. Demuth Howard, and H. Beale Mark, Neural Network Design, Martin HaganJan: Stillwater, 2002.
Google Scholar
G.J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall: NJ, 1995.
MATH Google Scholar
Y.-L. Huang, “A theoretic research of cluster indexing for mandarin chinese full text document—the construction of vector space model,” Journal of Library and Information, vol. 24, pp. 44–68, 1998.
Google Scholar
Rothlauf and Franz, Representations for Genetic and Evolutionary Algorithms, Heidelberg, Physica-Verlag, 2002.
Google Scholar
M.S. Bazarara, H.D. Sherali, and C.M. Shetty, Nonlinear Programming Theory and Algorithms, John Wiley & Sons: New York, 1993.
Google Scholar
Yi-Cheng Ye, Applications and Implementation of Neural Network Models, Ru-Lin, 1998.
Dunham Margaret H., Data Mining Introductory and Advanced Topics, N.J.: Prentice Hall/Pearson Education, 2003.
Google Scholar
M. Torii and M.T. Hagan, “Stability of steepest descent with momentum for quadratic functions,” IEEE Transactions on Neural Networks, vol. 13, no. 3, pp. 752–756, 2002.
Article Google Scholar
L. Mohan Saini and M. Kumar Soni, “Artificial neural network-based peak load forecasting using conjugate gradient methods,” IEEE Transactions on Power Systems, vol. 17, no. 3, pp. 907–912, 2002.
Article Google Scholar
G. Lera and M. Pinzolas, “Neighborhood based levenberg-marquardt algorithm for neural network training,” IEEE Transactions on Neural Networks, vol. 13, no. 5, pp. 1200–1203, 2002.
Article Google Scholar
T. Kimoto and K. Asakawa, “Stock market predication system with modular networks,” in IJCNN-90, 1990, vol. 1, pp. 1–6.
Google Scholar
Y.-C. Hou and S.-H. Yang, “A study on automatic document classification by combine fuzzy theory and genetic algorithms,” Journal of Fuzzy Systems, vol. 4, no. 1, pp. 45–57, 1998.
Google Scholar
Y.-Y. Yang, “Document Automatic Classification and Ranking,” Master Thesis, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, June 1993.
R.M. Friedberg, “A learning machine: Part I,” IBM Journal, vol. 2, pp. 2–23, 1958.
MathSciNet Google Scholar
F. Limin, Neural Networks in Computer Intelligence, McGraw Hill, 1994.

Download references

Author information

Authors and Affiliations

Graduate Institute of Learning Technology, National Hualien University of Education, 123 Hua-His Rd., Hualien, 970, Taiwan, Republic of China
Chih-Ming Chen
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 43 Sec. 4, Keelung Rd., Taipei, 106, Taiwan, Republic of China
Hahn-Ming Lee & Cheng-Wei Hwang

Authors

Chih-Ming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hahn-Ming Lee
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Wei Hwang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chih-Ming Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, CM., Lee, HM. & Hwang, CW. A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection. Appl Intell 23, 277–294 (2005). https://doi.org/10.1007/s10489-005-4613-0

Download citation

Issue Date: December 2005
DOI: https://doi.org/10.1007/s10489-005-4613-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection

Abstract

Access this article

Similar content being viewed by others

Automatic categorization of web text documents using fuzzy inference rule

Automatic Text Classification Using Neural Network and Statistical Approaches

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection

Abstract

Access this article

Similar content being viewed by others

Automatic categorization of web text documents using fuzzy inference rule

Automatic Text Classification Using Neural Network and Statistical Approaches

A Framework for Text Classification Using Intuitionistic Fuzzy Sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation