Skip to main content
Log in

Combining linear discriminant functions with neural networks for supervised learning

  • Articles
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

A novel supervised learning method is proposed by combining linear discriminant functions with neural networks. The proposed method results in a tree-structured hybrid architecture. Due to constructive learning, the binary tree hierarchical architecture is automatically generated by a controlled growing process for a specific supervised learning task. Unlike the classic decision tree, the linear discriminant functions are merely employed in the intermediate level of the tree for heuristically partitioning a large and complicated task into several smaller and simpler subtasks in the proposed method. These subtasks are dealt with by component neural networks at the leaves of the tree accordingly. For constructive learning, growing and credit-assignment algorithms are developed to serve for the hybrid architecture. The proposed architecture provides an efficient way to apply existing neural networks (e.g. multi-layered perceptron) for solving a large scale problem. We have already applied the proposed method to a universal approximation problem and several benchmark classification problems in order to evaluate its performance. Simulation results have shown that the proposed method yields better results and faster training in comparison with the multilayered perceptron.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bishop M. Neural Networks for Pattern Recognition. Oxford University Press, 1995.

  2. Cohen M, Franco H, Morgan N, Rumelhart D, Abrash V. Context-dependent multiple distribution phonetic modeling with MLPs. In: SJ Hanson, JD Cowan, CL Giles (eds.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1993, pp. 649–657.

  3. Gyuyon I, Albrecht P, LeCun Y, Denker J, Hubbard W. Applications of neural networks to character recognition. Int J Pattern Recognition and Artificial Intelligence 1991; 5: 353–382.

    Google Scholar 

  4. Haykin S, Deng C. Classification of radar clutter using neural networks. IEEE Trans Neural Networks 1991; 2: 589–600.

    Google Scholar 

  5. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W. Handwritten digit recognition with a back-propagation network. In: DS Touretsky, (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 396–404.

  6. Narendra KS, Parthasarathy K. Indentification and control of dynamical systems using neural networks. IEEE Trans Neural Networks 1990; 1: 4–27.

    Google Scholar 

  7. Pomerleau DA. Neural network perception for mobile robot guidance. PhD Thesis, School of Computer Science, Carnegie Mellon University, 1992.

  8. Rajavelu A, Musavi M, Shivaikar M. A neural network approach to character recognition. Neural Networks 1989: 2(5): 387–394.

    Google Scholar 

  9. Rumelhart D, McClelland J. Parallel Distributed Processing. MIT Press, Cambridge, MA, 1986.

    Google Scholar 

  10. Sejnowski TJ, Resenberg CR. Parallel networks that learn to pronounce English text. Complex Systems 1987; 1: 145–168.

    Google Scholar 

  11. Sejnowski TJ, Yuhas BP, Goldstein MH, Jenkins RE. Combining visual and acoustic speech signals with a neural network improves intelligibility. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 232–239.

  12. Hornik K, Stinchcombe M, White H. Multilayer feed-forward networks are universal approximators. Neural Networks 1989; 2: 359–366.

    Google Scholar 

  13. Irie B, Miyake S. Capabilities of three-layered perceptrons. Proc IEEE Int Conf Neural Networks, vol 1, 1988; pp. 641–648.

    Google Scholar 

  14. Judd S. Learning in networks is hard. Proc IEEE Int Conf Neural Networks, vol 2, 1987, pp. 685–692.

    Google Scholar 

  15. Jacobs RA. Increased rates of convergence through learning rate adaptation. Neural Networks 1988; 1: 295–307.

    Google Scholar 

  16. Van Der Smagt PP. Minimization methods for training feedforward neural networks. Neural Networks 1994; 7(1): 1–11.

    Google Scholar 

  17. Ripley BD. Pattern Recognition and Neural Networks. Cambridge University Press, New York, 1996.

    Google Scholar 

  18. Wahba G. Generalization and regularization in nonlinear learning systems. In: MA Arbib (ed.), The Handbook of Brain Theory and Neural Networks. MIT Press, 1995, pp. 426–430.

  19. Fahlman SE, Lebiere C. The cascade-correlation learning architecture. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems. Morgan Kaufmann, 1990, pp. 524–532.

  20. Nadal JP. New algorithms for feedforward networks. In: Theumann and Kiberle (eds.), Neural Networks and Spin Glasses. World Scientific, 1989, pp. 80–88.

  21. Shadafan RS, Niranjan M. A dynamic neural network architecture by sequential partitioning of the input space. Neural Computation 1994; 6: 1202–1222.

    Google Scholar 

  22. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Wadsworth & Brooks, 1984.

  23. Brown DE, Pittard CL. Classification trees with optimal multivariate splits. Proc IEEE Int Conf Systems, Man and Cybernetics, vol 3, Le Touquet, 1993, pp. 475–477.

  24. Friedman JH. A recursive partitioning decision rule for nonparametric classification. IEEE Trans Computer 1977; 26: 404–408.

    Google Scholar 

  25. Kim B, Landgrebe DA. Hierarchical classifier design in high-dimensional numerous class cases. IEEE Trans Geosci Remote Sens 1991; 29(4): 518–528.

    Google Scholar 

  26. Murthy KVS. On growing better decision trees from data. PhD Thesis, The Johns Hopkins University, 1995.

  27. Park Y, Sklansky J. Automated design of linear tree classifiers. Patt Recogn 1990; 23(12): 1393–1412.

    Google Scholar 

  28. Shi QY, Fu KS. A method for the design of binary tree classifiers. Patt Recogn 1983; 16: 593–603.

    Google Scholar 

  29. Sklansky J, Wassel GN. Pattern Classifiers and Trainable Machines. Springer-Verlag, New York, 1981.

    Google Scholar 

  30. Curram SP, Mingers J. Neural networks, decision tree induction and discriminant analysis: An empirical comparison. J Operat Res Soc 1994; 45(4): 440–450.

    Google Scholar 

  31. Park Y. A comparison of neural net classifiers and linear tree classifiers: their similarities and differences. Patt Recogn 1994; 27(11): 1493–1503.

    Google Scholar 

  32. Cios KJ, Liu N. A machine learning method for generation of a neural network architecture: A continuous ID3 algorithm. IEEE Trans Neural Networks 1992; 3(2): 280–291.

    Google Scholar 

  33. Golea M, Marchand M. A growth algorithm for neural network decision trees. EuroPhysics Lett 1990; 12(3): 205–210.

    Google Scholar 

  34. Guo H, Gelfand SB. Classification trees with neural network feature extraction. IEEE Trans Neural Networks 1992; 3(6): 923–933.

    Google Scholar 

  35. Herman GT, Yeung KTD. On piecewise-linear classification. IEEE Trans Pattern Analysis and Machine Intelligence 1992; 14(7): 782–786.

    Google Scholar 

  36. Ishwar K, Sethi K. Entropy nets: from decision trees to neural networks. Proc IEEE 1990; 78(10): 1605–1613.

    Google Scholar 

  37. Dalche-Buc F, Zwierski D, Nadal JP. Trio learning: A new strategy for building hybrid neural trees. Int J Neural Systems 1994; 5(4): 259–274.

    Google Scholar 

  38. Sankar A, Mammone RJ. Growing and pruning neural tree networks. IEEE Trans Computer 1993; 42(3): 291–299.

    Google Scholar 

  39. Sirat JA, Nadal JP. Neural tree: A new tool for classification. Network: Computation in Neural Systems 1990; 1(4): 423–438.

    Google Scholar 

  40. Jordan MI, Jacobs RA. Hierarchical mixture of experts and the EM algorithm. Neural Computation 1994; 6: 181–214.

    Google Scholar 

  41. Chen K, Xie DH, Chi HS. A modified HME architecture for text-dependent speaker identification. IEEE Trans Neural Networks 1996; 7(5): 1309–1313.

    Google Scholar 

  42. Chen K, Xie DH, Chi HS. Speaker identification using time-delay HMEs. Int J Neural Systems 1996; 7(1): 29–43.

    Google Scholar 

  43. Chen K, Yang LP, Yu X, Chi HS. A self-generating modular neural network architecture for supervised learning. Neurocomputing 1997; 16(1): 33–48.

    Google Scholar 

  44. Fisher RA. The use of multiple measurements in taxonomic problem. Ann Eugenics 1936; 7: 179–188.

    Google Scholar 

  45. Duda R, Hart P. Pattern Classification and Scene Analysis. John Wiley & Sons, New York, 1973.

    Google Scholar 

  46. Murthy PM, Aha DW. UCI Repository of machine learning database. [http://www.ics.uci.edu/mlearn/MLRepository.html], Department of Information and Computer Science, Irvine, CA: University of California, 1994.

    Google Scholar 

  47. Fletcher R. Practical Methods of Optimization. John Wiley & Sons, New York, 1987.

    Google Scholar 

  48. Ishikawa M. Structural learning with forgetting. Neural Networks 1996; 9(3): 509–521.

    Google Scholar 

  49. Lang KJ, Witbrock MJ. Learning to tell two spirals apart. In: D Touretzky, G Hinton, T Sejnowski (eds.), Proc 1988 Connectionist Models Summer School, Morgan Kaufmann, 1989; 52–59.

  50. Deterding DH. Speaker normalization for automatic speech recognition. PhD Thesis, University of Cambridge, 1989.

  51. Robinson AJ. Dynamic error propagation networks. PhD Thesis, University of Cambridge, 1989.

  52. Cybenko G. Approximation by superpositions of a sigmoidal function. University of Illinois, Urbana, 1988.

    Google Scholar 

  53. Funahashi K. On the approximate realization of continuous mappings by neural networks. Neural Networks 1989; 2: 183–192.

    Google Scholar 

  54. Chen K, Yu X, Chi HS. Text-dependent speaker identification based on the modular tree. Chinese J Electr 1996; 5(2): 63–69.

    Google Scholar 

  55. Chen K, Yu X, Chi HS. Text-dependent speaker identification based on the modular tree: an empirical study. In: S Amariet al. (eds.), Progress in Neural Information Processing. 1996, Springer-Verlag, Singapore, pp. 294–299.

    Google Scholar 

  56. Blum A, Rivest R. Training a 3-node neural net is NP-complete. In: DS Touretsky (ed.), Advances in Neural Information Processing Systems, Morgan Kaufmann, 1989, pp. 494–501.

  57. Minsky M, Papert S. Perceptrons: An Introduction to Computational Geometry. MIT Press, Camridge, 1988.

    Google Scholar 

  58. Wolpert DH. Stacked generalization. Technical Report LA-UR-90-3460, The Santa Fe Institute, 1990.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, K., Yu, X. & Chi, H. Combining linear discriminant functions with neural networks for supervised learning. Neural Comput & Applic 6, 19–41 (1997). https://doi.org/10.1007/BF01670150

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01670150

Keywords

Navigation