Abstract
A novel supervised learning method is proposed by combining linear discriminant functions with neural networks. The proposed method results in a tree-structured hybrid architecture. Due to constructive learning, the binary tree hierarchical architecture is automatically generated by a controlled growing process for a specific supervised learning task. Unlike the classic decision tree, the linear discriminant functions are merely employed in the intermediate level of the tree for heuristically partitioning a large and complicated task into several smaller and simpler subtasks in the proposed method. These subtasks are dealt with by component neural networks at the leaves of the tree accordingly. For constructive learning, growing and credit-assignment algorithms are developed to serve for the hybrid architecture. The proposed architecture provides an efficient way to apply existing neural networks (e.g. multi-layered perceptron) for solving a large scale problem. We have already applied the proposed method to a universal approximation problem and several benchmark classification problems in order to evaluate its performance. Simulation results have shown that the proposed method yields better results and faster training in comparison with the multilayered perceptron.
Similar content being viewed by others
References
Bishop M. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
Cohen M, Franco H, Morgan N, Rumelhart D, Abrash V. Context-dependent multiple distribution phonetic modeling with MLPs. In: SJ Hanson, JD Cowan, CL Giles (eds.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1993, pp. 649–657.
Gyuyon I, Albrecht P, LeCun Y, Denker J, Hubbard W. Applications of neural networks to character recognition. Int J Pattern Recognition and Artificial Intelligence 1991; 5: 353–382.
Haykin S, Deng C. Classification of radar clutter using neural networks. IEEE Trans Neural Networks 1991; 2: 589–600.
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W. Handwritten digit recognition with a back-propagation network. In: DS Touretsky, (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 396–404.
Narendra KS, Parthasarathy K. Indentification and control of dynamical systems using neural networks. IEEE Trans Neural Networks 1990; 1: 4–27.
Pomerleau DA. Neural network perception for mobile robot guidance. PhD Thesis, School of Computer Science, Carnegie Mellon University, 1992.
Rajavelu A, Musavi M, Shivaikar M. A neural network approach to character recognition. Neural Networks 1989: 2(5): 387–394.
Rumelhart D, McClelland J. Parallel Distributed Processing. MIT Press, Cambridge, MA, 1986.
Sejnowski TJ, Resenberg CR. Parallel networks that learn to pronounce English text. Complex Systems 1987; 1: 145–168.
Sejnowski TJ, Yuhas BP, Goldstein MH, Jenkins RE. Combining visual and acoustic speech signals with a neural network improves intelligibility. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 232–239.
Hornik K, Stinchcombe M, White H. Multilayer feed-forward networks are universal approximators. Neural Networks 1989; 2: 359–366.
Irie B, Miyake S. Capabilities of three-layered perceptrons. Proc IEEE Int Conf Neural Networks, vol 1, 1988; pp. 641–648.
Judd S. Learning in networks is hard. Proc IEEE Int Conf Neural Networks, vol 2, 1987, pp. 685–692.
Jacobs RA. Increased rates of convergence through learning rate adaptation. Neural Networks 1988; 1: 295–307.
Van Der Smagt PP. Minimization methods for training feedforward neural networks. Neural Networks 1994; 7(1): 1–11.
Ripley BD. Pattern Recognition and Neural Networks. Cambridge University Press, New York, 1996.
Wahba G. Generalization and regularization in nonlinear learning systems. In: MA Arbib (ed.), The Handbook of Brain Theory and Neural Networks. MIT Press, 1995, pp. 426–430.
Fahlman SE, Lebiere C. The cascade-correlation learning architecture. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 524–532.
Nadal JP. New algorithms for feedforward networks. In: Theumann and Kiberle (eds.), Neural Networks and Spin Glasses. World Scientific, 1989, pp. 80–88.
Shadafan RS, Niranjan M. A dynamic neural network architecture by sequential partitioning of the input space. Neural Computation 1994; 6: 1202–1222.
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Wadsworth & Brooks, 1984.
Brown DE, Pittard CL. Classification trees with optimal multivariate splits. Proc IEEE Int Conf Systems, Man and Cybernetics, vol 3, Le Touquet, 1993, pp. 475–477.
Friedman JH. A recursive partitioning decision rule for nonparametric classification. IEEE Trans Computer 1977; 26: 404–408.
Kim B, Landgrebe DA. Hierarchical classifier design in high-dimensional numerous class cases. IEEE Trans Geosci Remote Sens 1991; 29(4): 518–528.
Murthy KVS. On growing better decision trees from data. PhD Thesis, The Johns Hopkins University, 1995.
Park Y, Sklansky J. Automated design of linear tree classifiers. Patt Recogn 1990; 23(12): 1393–1412.
Shi QY, Fu KS. A method for the design of binary tree classifiers. Patt Recogn 1983; 16: 593–603.
Sklansky J, Wassel GN. Pattern Classifiers and Trainable Machines. Springer-Verlag, New York, 1981.
Curram SP, Mingers J. Neural networks, decision tree induction and discriminant analysis: An empirical comparison. J Operat Res Soc 1994; 45(4): 440–450.
Park Y. A comparison of neural net classifiers and linear tree classifiers: their similarities and differences. Patt Recogn 1994; 27(11): 1493–1503.
Cios KJ, Liu N. A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm. IEEE Trans Neural Networks 1992; 3(2): 280–291.
Golea M, Marchand M. A growth algorithm for neural network decision trees. EuroPhysics Lett 1990; 12(3): 205–210.
Guo H, Gelfand SB. Classification trees with neural network feature extraction. IEEE Trans Neural Networks 1992; 3(6): 923–933.
Herman GT, Yeung KTD. On piecewise-linear classification. IEEE Trans Pattern Analysis and Machine Intelligence 1992; 14(7): 782–786.
Ishwar K, Sethi K. Entropy nets: from decision trees to neural networks. Proc IEEE 1990; 78(10): 1605–1613.
Dalche-Buc F, Zwierski D, Nadal JP. Trio learning: A new strategy for building hybrid neural trees. Int J Neural Systems 1994; 5(4): 259–274.
Sankar A, Mammone RJ. Growing and pruning neural tree networks. IEEE Trans Computer 1993; 42(3): 291–299.
Sirat JA, Nadal JP. Neural tree: A new tool for classification. Network: Computation in Neural Systems 1990; 1(4): 423–438.
Jordan MI, Jacobs RA. Hierarchical mixture of experts and the EM algorithm. Neural Computation 1994; 6: 181–214.
Chen K, Xie DH, Chi HS. A modified HME architecture for text-dependent speaker identification. IEEE Trans Neural Networks 1996; 7(5): 1309–1313.
Chen K, Xie DH, Chi HS. Speaker identification using time-delay HMEs. Int J Neural Systems 1996; 7(1): 29–43.
Chen K, Yang LP, Yu X, Chi HS. A self-generating modular neural network architecture for supervised learning. Neurocomputing 1997; 16(1): 33–48.
Fisher RA. The use of multiple measurements in taxonomic problem. Ann Eugenics 1936; 7: 179–188.
Duda R, Hart P. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.
Murthy PM, Aha DW. UCI Repository of machine learning database. [http://www.ics.uci.edu/mlearn/MLRepository.html], Department of Information and Computer Science, Irvine, CA: University of California, 1994.
Fletcher R. Practical Methods of Optimization. John Wiley & Sons, New York, 1987.
Ishikawa M. Structural learning with forgetting. Neural Networks 1996; 9(3): 509–521.
Lang KJ, Witbrock MJ. Learning to tell two spirals apart. In: D Touretzky, G Hinton, T Sejnowski (eds.), Proc 1988 Connectionist Models Summer School, Morgan Kaufmann, 1989; 52–59.
Deterding DH. Speaker normalization for automatic speech recognition. PhD Thesis, University of Cambridge, 1989.
Robinson AJ. Dynamic error propagation networks. PhD Thesis, University of Cambridge, 1989.
Cybenko G. Approximation by superpositions of a sigmoidal function. University of Illinois, Urbana, 1988.
Funahashi K. On the approximate realization of continuous mappings by neural networks. Neural Networks 1989; 2: 183–192.
Chen K, Yu X, Chi HS. Text-dependent speaker identification based on the modular tree. Chinese J Electr 1996; 5(2): 63–69.
Chen K, Yu X, Chi HS. Text-dependent speaker identification based on the modular tree: an empirical study. In: S Amariet al. (eds.), Progress in Neural Information Processing. 1996, Springer-Verlag, Singapore, pp. 294–299.
Blum A, Rivest R. Training a 3-node neural net is NP-complete. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems, Morgan Kaufmann, 1989, pp. 494–501.
Minsky M, Papert S. Perceptrons: An Introduction to Computational Geometry. MIT Press, Camridge, 1988.
Wolpert DH. Stacked generalization. Technical Report LA-UR-90-3460, The Santa Fe Institute, 1990.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, K., Yu, X. & Chi, H. Combining linear discriminant functions with neural networks for supervised learning. Neural Comput & Applic 6, 19–41 (1997). https://doi.org/10.1007/BF01670150
Issue Date:
DOI: https://doi.org/10.1007/BF01670150