Abstract
Approaches to document classification belong to two major families: similarity-based (crisp) classification methods and neural networks (gradual) ones. For gradual techniques, a major open issue is controlling search space dimension. While similarity-based methods identify clusters based on the same number of variables used for document encoding, neural networks automatically identify variables that cause distinctions among clusters. Therefore, the variables’ number may vary depending on the documents structure and content, and is difficult to estimate it a priori. This paper proposes a hybrid classification method suitable for heterogeneous document bases like the ones commonly encountered in business and knowledge management applications. Our method is based on an evolutionary algorithm for tuning both neural network’s structure and weights. While searching the optimal neural network’s configuration it is possible to determine the minimal number of variables to be used in order to classify the given set of documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahmad, K., Davies, A.E.: Weirdness in Special-language Text: Welsh Radioactive Chemicals Texts as an Exemplar. Internationales Institut fr Terminologieforschung Journal, 22–52 (1994)
Azzini, A., Tettamanzi, A.G.B.: A neural evolutionary approach to financial modeling. In: GECCO 2006 - Genetic and Evolutionary Computation Conference, ed., July 8-12, 2006 (to appear)
Bosch, P., Rocacher, D.: The set of fuzzy rational numbers and flexible querying. Fuzzy Sets and Systems (2005)
Bouchon-Meunier, B., Rifqi, M., Bothorel, S.: Towards general measures of comparison of objects. Fuzzy Sets Syst. 84(2), 143–153 (1996)
Ceravolo, P., Damiani, E., Viviani, M.: Extending formal concept analysis by fuzzy bags. In: Proceedings of IPMU, Paris (2006)
Ceravolo, P., Corallo, A., Damiani, E., Elia, G., Viviani, M., Zilli, A.: Bottom-up extraction and maintenance of ontology-based metadata. In: Sanchez, E. (ed.) Fuzzy Logic and the Semantic Web. Elsevier, Amsterdam (2006)
Farkas, J.: Generating Document Clusters Using Thesauri and Neural Networks. In: Canadian Conference on Electrical and Computer Engineering, vol. 2, pp. 710–713 (1994)
Faure, D., Poibeau, T.: First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In: Proceedings of the ECAI Workshop on Ontology Learning (2000)
Iwayama, M., Tokunaga, T.: Hierarchical bayesian clustering for automatic text classification. In: IJCAI 1995. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 2, pp. 1322–1327 (1995)
Li, W., Lee, B., Krausz, F., Sahin, K.: Text classification by a neural network. In: Proceedings of the 1991 Summer Computer Simulation Conference. Twenty-Third Annual Summer Computer Simulation Conference, pp. 313–318 (1991)
Li, Y., Jain, A.K.: Classification of text documents. In: Proceedings of IJCAI 1995. International conference on pattern recognition, vol. 14, pp. 1295–1297. IEEE computer society press, Los Alamitos (1998)
Hahn, U., Marko, K.G.: Ontology and lexicon evolution by text understanding. In: Proceedings of the ECAI Workshop on Machine Learning and Natural Language Processing for Ontology Engineering (2002)
Wille, R.: Restructuring lattice theory: An approach based on hierarchies of concepts. Ordered Sets, pp. 445–470 (1982)
Yager, R.: Cardinality of fuzzy sets via bags. Math. Modelling 9(6), 441–446 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Azzini, A., Ceravolo, P. (2006). Evolutionary ANNs for Improving Accuracy and Efficiency in Document Classification Methods. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893011_140
Download citation
DOI: https://doi.org/10.1007/11893011_140
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46542-3
Online ISBN: 978-3-540-46544-7
eBook Packages: Computer ScienceComputer Science (R0)