Skip to main content

Learning Machines

  • Chapter
Feature Extraction

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

  • 9429 Accesses


Learning from data may be a very complex task. To satisfactorily solve a variety of problems, many different types of algorithms may need to be combined. Feature extraction algorithms are valuable tools, which prepare data for other learning methods. To estimate their usefulness one must examine the whole complex processes they are parts of.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  • R. Adamczak, W. Duch, and N. Jankowski. New developments in the feature space mapping model. In Third Conference on Neural Networks and Their Applications, pages 65–70, Kule, Poland, 1997. Polish Neural Networks Society.

    Google Scholar 

  • A. Ben-Hur, D. Horn, H.T. Siegelman, and V. Vapnik. Suppor vector clustering. Journal of Machine Learning Research, 2:125–137, 2001.

    Article  Google Scholar 

  • K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.

    Article  Google Scholar 

  • C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, London, UK, 1995.

    Google Scholar 

  • C.M. Bishop, M. Svensén, and C.K.I. Williams. EM optimization of latent-variable density models. In Advances in Neural Information Processing Systems, volume 8. MIT Press, Cambridge, MA, 1996.

    Google Scholar 

  • L. Bobrowski and M. Krtowski. Induction of multivariate decision trees by using dipolar criteria. In D. A. Zighed, J. Komorowski, and J. M. ytkow, editors, Principles of data mining and knowledge discovery: 5th European Conference: PKDD’2000, pages 331–336, Berlin, 2000. Springer Verlag.

    Google Scholar 

  • B.E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM, 1992.

    Google Scholar 

  • L. Breiman. Bias, variance, and arcing classifiers. Technical Report Technical Report 460, Statistics Department, University of California, Berkeley, CA 94720, April 1996.

    Google Scholar 

  • L. Breiman. Bias-variance, regularization, instability and stabilization. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 27–56. Springer, 1998.

    Google Scholar 

  • L. Breiman, J. H. Friedman, A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth, Belmont, CA, 1984.

    MATH  Google Scholar 

  • W. Buntine. Learning classification trees. In D. J. Hand, editor, Artificial Intelligence frontiers in statistics, pages 182–201. Chapman & Hall, London, 1993. URL

    Google Scholar 

  • C. Campbell and C.V. Perez. Target switching algorithm: a constructive learning procedure for feed-forward neural networks. Neural Networks, pages 1221–1240, 1995.

    Google Scholar 

  • B. Cestnik. Estimating probabilities: A crucial task in machine learning. In Proceedings of the Ninth European Conference on Artificial Intelligence, pages 147–149, 1990.

    Google Scholar 

  • S. Chen, S.A. Billings, and W. Luo. Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control, 50:1873–1896, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  • V. Cherkassky and F. Mulier. Learning from data. Adaptive and learning systems for signal processing, communications and control. John Wiley & Sons, Inc., New York, 1998.

    Google Scholar 

  • C. Cortes and V. Vapnik. Soft margin classifiers. Machine Learning, 20:273–297, 1995.

    MATH  Google Scholar 

  • T.M. Cover and P.E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, 1967.

    Article  MATH  Google Scholar 

  • G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2:303–314, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  • J. P. Marques de Sá. Pattern Recognition. Concepts, Methods and Applications. Springer Verlag, 2001.

    Google Scholar 

  • W. Duch. Similarity based methods: a general framework for classification, approximation and association. Control and Cybernetics, 29:937–968, 2000.

    MATH  MathSciNet  Google Scholar 

  • W Duch and K. Grabczewski. Heterogeneous adaptive systems. In Proceedings of the World Congress of Computational Intelligence, Honolulu, May 2002.

    Google Scholar 

  • W. Duch and N. Jankowski. Survey of neural transfer functions. Neural Computing Surveys, 2:163–212, 1999.

    Google Scholar 

  • W. Duch, R. Adamczak, and K. Grabczewski. Extraction of logical rules from backpropagation networks. Neural Processing Letters, 7:1–9, 1998.

    Article  Google Scholar 

  • W. Duch, Ł. Itert, and K. Grudziński. Competent undemocratic committees. In L. Rutkowski and J. Kacprzyk, editors, 6th International Conference on Neural Networks and Soft Computing, pages 412–417, Zakopane, Poland, 2002. Springer-Verlag.

    Google Scholar 

  • R. O. Duda, P. E. Hart, and D. G. Stork. Patter Classification. John Wiley and Sons, New York, 2001.

    Google Scholar 

  • R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.

    MATH  Google Scholar 

  • S.E. Fahlman. Fast-learning variations on back-propagation: An empirical study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pages 38–51, Pittsburg, 1989. Morgan Kaufmann, San Mateo.

    Google Scholar 

  • S.E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 524–532, Denver, CO, 1990. Morgan Kaufmann, San Mateo.

    Google Scholar 

  • E. Fiesler. Comparative bibliography of ontogenic neural networks. In International Conference on Artificial Neural Networks, pages 793–796, 1994.

    Google Scholar 

  • W. Finnoff, F. Hergert, and H.G. Zimmermann. Improving model detection by nonconvergent methods. Neural Networks, 6(6):771–783, 1993.

    Article  Google Scholar 

  • R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188, 1936. Reprinted in Contributions to Mathematical Statistics, John Wiley & Sons, New York, 1950.

    Google Scholar 

  • R. Fletcher and C.M. Reeves. Function minimization by conjugate gradients. Computer journal, 7:149–154, 1964.

    Article  MATH  MathSciNet  Google Scholar 

  • M. Frean. Small nets and short paths: optimizing neural computation. PhD thesis, Center for cognitive science. University of Edinburgh, 1990.

    Google Scholar 

  • Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of Thirteenth International Conference, pages 148–156, 1996.

    Google Scholar 

  • Y. Freund and R.E. Schapire. A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science, 55(1):119–139, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  • J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.

    Google Scholar 

  • G.M. Fung and O.L. Mangasarian. A feature selection newton method for support vector machine classification. Comput. Optim. Appl., 28(2):185–202, 2004. ISSN 0926-6003. doi:

    Article  MATH  MathSciNet  Google Scholar 

  • K. Grabczewski and W. Duch. A general purpose separability criterion for classification systems. In Proceedings of the 4th Conference on Neural Networks and Their Applications, pages 203–208, Zakopane, Poland, June 1999.

    Google Scholar 

  • M. Grochowski and N. Jankowski. Comparison of instances seletion algorithms II: Algorithms survey. In Artificial Intelligence and Soft Computing, pages 598–603, 2004.

    Google Scholar 

  • I. Guyon and D.G. Stork. Advances in large margin classifiers, chapter Linear discriminant and support vector classiers, pages 147–169. MIT Press, 2000.

    Google Scholar 

  • I. Guyon, J. Weston, and S. Barnhilland V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 2002.

    Google Scholar 

  • E.J. Hartman, J.D. Keeler, and J.M. Kowalski. Layered neural networks with gaussian hidden units as universal approximations. Neural Computation, 2:210–215, 1990.

    Article  Google Scholar 

  • B. Hassibi and D.G. Stork. Second order derivatives for network pruning: Optimal brain surgeon. In C.L. Giles, S.J. Hanson, and J.D. Cowan, editors, Advances in Neural Information Processing Systems 5, pages 164–171, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  • T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.

    Google Scholar 

  • S. Haykin. Neural Networks — A Comprehensive Foundation. Maxwell MacMillian Int., New York, 1994.

    MATH  Google Scholar 

  • G.E. Hinton. Learning translation invariant in massively parallel networks. In J.W. de Bakker, A.J. Nijman, and P.C. Treleaven, editors, Proceedings of PARLE Conference on Parallel Architectures and Languages Europe, pages 1–13, Berlin, 1987. Springer-Verlag.

    Google Scholar 

  • K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2:359–366, 1989.

    Article  Google Scholar 

  • R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.

    Article  Google Scholar 

  • N. Jankowski and W. Duch. Optimal transfer function neural networks. In 9th European Symposium on Artificial Neural Networks, pages 101–106, Bruges, Belgium, 2001.

    Google Scholar 

  • N. Jankowski and V. Kadirkamanathan. Statistical control of RBF-like networks for classification. In 7th International Conference on Artificial Neural Networks, pages 385–390, Lausanne, Switzerland, October 1997. Springer-Verlag.

    Google Scholar 

  • N. Jankowski, K. Grabczewski, and W. Duch. Ghostminer 3.0. FQS Poland, Fujitsu, Kraków, Poland, 2003.

    Google Scholar 

  • Norbert Jankowski and Krzysztof Grabczewski. Heterogenous committees with competence analysis. In Proceedings of the Fifth International conference on Hybrid Intelligent Systems.

    Google Scholar 

  • T. Joachims. Advances in kernel methods — support vector learning, chapter Making large-scale SVM learning practical. MIT Press, Cambridge, MA, 1998.

    Google Scholar 

  • G. H. John and P. Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, San Mateo, 1995. Morgan Kaufmann Publishers.

    Google Scholar 

  • M.I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.

    Article  Google Scholar 

  • R. Kohavi. Wrappers for performance enhancement and oblivious decision graphs. PhD thesis, Stanford University, 1995.

    Google Scholar 

  • R. Kohavi, D. Sommerfield, and J. Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools with Artificial Intelligence, pages 234–245. IEEE Computer Society Press, 1996.

  • R. Kohavi, B. Becker, and D. Sommerfield. Improving simple bayes. In Proceedings of the European Conference on Machine Learning, 1997. URL citeseer.nj.

    Google Scholar 

  • T. Kohonen. Self-organizing maps. Springer, Heidelberg Berlin, 1995.

    Google Scholar 

  • S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:76–86, 1951.

    Article  MathSciNet  Google Scholar 

  • L.I. Kuncheva. Combining Pattern Classifiers. Methods and Algorithms. Wiley-Interscience, 2004.

    Google Scholar 

  • Y. LeCun, J.S. Denker, and S.A. Solla. Optimal brain damage. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 598–605, Denver, CO, 1990. Morgan Kaufmann, San Mateo.

    Google Scholar 

  • K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly Journal of Applied Mathematics, II(2):164–168, 1944.

    MathSciNet  Google Scholar 

  • W.-Y. Loh and Y.-S. Shih. Split selection methods for classification trees. Statistica Sinica, 7:815–840, 1997.

    MATH  MathSciNet  Google Scholar 

  • W.-Y. Loh and N. Vanichsetakul. Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association, 83:715–728, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  • R. Maclin. Boosting classifiers regionally. In Proceeding of AAAI, 1998.

    Google Scholar 

  • D.W. Marquardt. An algorithm for least-squares estimation of non-linear parameters. Journal of the Society of Industrial and Applied Mathematics, 11(2): 431–441, 1963.

    Article  MATH  MathSciNet  Google Scholar 

  • W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115–133, 1943.

    Article  MATH  MathSciNet  Google Scholar 

  • M. Mézard and J.-P. Nadal. Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A, 22:2191–2204, 1989.

    Article  Google Scholar 

  • T. Mitchell. Machine learning. McGraw Hill, 1997.

    Google Scholar 

  • W. Müller and F. Wysotzki. Automatic construction of decision trees for classification. Annals of Operations Research, 52:231–247, 1994.

    Article  MATH  Google Scholar 

  • S. K. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2:1–32, August 1994.

    MATH  Google Scholar 

  • R.M. Neal. Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in Statistics. Springer-Verlag, 1996.

    Google Scholar 

  • M. Orr. Introduction to radial basis function networks. Technical report, Centre for Cognitive Science, University of Edinburgh, 1996.

    Google Scholar 

  • E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An application to face detection. In CVPR’97, pages 130–136, New York, NY, 1997. IEEE.

    Google Scholar 

  • J. Park and I.W. Sandberg. Universal approximation using radial-basis-function networks. Neural Computation, 3(2):246–257, 1991.

    Article  Google Scholar 

  • Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11(5):341–356, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  • J. Platt. A resource-allocating network for function interpolation. Neural Computation, 3(2):213–225, 1991.

    Article  MathSciNet  Google Scholar 

  • J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning. MIT Press, Cambridge, MA., 1998.

    Google Scholar 

  • T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, 1990.

    Article  Google Scholar 

  • F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. Technical Report IS-00-04, Stern School of Business, New York University, 2000.

    Google Scholar 

  • J. Quinlan. Programs for machine learning, 1993.

    Google Scholar 

  • J. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.

    Google Scholar 

  • J. R. Quinlan and R. M. Cameron-Jones. Oversearching and layered search in empirical learning. In IJCAI, pages 1019–1024, 1995. URL com/quinlan95oversearching.html.

    Google Scholar 

  • B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.

    MATH  Google Scholar 

  • J. Rissanen. Modeling by shortest data description. Automatica, 14:445–471, 1978.

    Article  Google Scholar 

  • F. Rosenblatt. Principles of Neurodynamics. Spartan, New York, 1962.

    MATH  Google Scholar 

  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In J. L. McCleland D. E. Rumelhart, editor, Parallel Distributed Processing: Explorations in Microstructure of Congnition, volume 1: Foundations, pages 318–362. Cambridge, 1986.

    Google Scholar 

  • R. Schalkoff. Pattern Recognition: statistical, structural and neural approaches. Wiley, 1992.

    Google Scholar 

  • R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998.

    Article  MATH  MathSciNet  Google Scholar 

  • B. Schölkopf and A.J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.

    Google Scholar 

  • B. Schölkopf, A.J. Smola, R.C. Williamson, and P.L. Bartlett. New support vector algorithms. Neural Computation, 12:1207–1245, 2000.

    Article  Google Scholar 

  • B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001.

    Article  MATH  Google Scholar 

  • C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, Urbana, 1949.

    MATH  Google Scholar 

  • S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, and K.R.K. Murthy. Improvements to the SMO algorithm for SVM regression. IEEE Transactions on Neural Networks, 11:1188–1194, Sept. 2000.

    Article  Google Scholar 

  • A.N. Tikhonov. On solving incorrectly posed problems and method of regularization. Doklady Akademii Nauk USSR, 151:501–504, 1963.

    Google Scholar 

  • A.N. Tikhonov and V.Y. Arsenin. Solutions of Ill-posed Problems. W.H. Winston, Washington D.C., 1977.

    MATH  Google Scholar 

  • P. E. Utgoff and C. E. Brodley. Linear machine decision trees. Technical Report UMCS-1991-010, Department of Computer Science, University of Massachusetts,, 1991. URL

    Google Scholar 

  • A.S. Weigend, D.E. Rumelhart, and B.A. Huberman. Back-propagation, weight elimination and time series prediction. In Proceedings of the 1990 Connectionist Models Summer School, pages 65–80, Los Altos, Palo Alto, San Francisco, 1990. Morgan Kaufmann, San Mateo.

    Google Scholar 

  • A.S. Weigend, D.E. Rumelhart, and B.A. Huberman. Generalization by weight elimination with application to forecasting. In Advances in Neural Information Processing Systems 3, pages 875–882, Los Altos, Palo Alto, San Francisco, 1991. Morgan Kaufmann, San Mateo.

    Google Scholar 

  • P.J. Werbose. Beyond regression: New tools for prediction and analysis in the bahavioral sciences. PhD thesis, Harvard Univeristy, Cambridge, MA, 1974.

    Google Scholar 

  • J. Weston, A. Elisseeff, and B. Schölkopf. Use of the λ 0-norm with linear models and kernel methods. Technical report, Biowulf Technologies, 2001.

    Google Scholar 

  • D.R. Wilson and T.R. Martinez. Reduction techniques for instance-based learning algorithms. Machine Learning, 38:257–286, 2000.

    Article  MATH  Google Scholar 

  • D.R. Wilson and T.R. Martinez. Instance pruning techniques. In 14th International Conference on Machine Learning, pages 403–411. Morgan Kaufmann, 1997. URL

    Google Scholar 

  • D.H. Wolpert. Stacked generalization. Neural Networks, 5:241–249, 1992.

    Article  Google Scholar 

  • B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proc. 18th International Conf. on Machine Learning, pages 609–616. Morgan Kaufmann, San Francisco, CA, 2001. URL

    Google Scholar 

  • J.M. Zurada. Artificial neural systems. West Publishing Company, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Jankowski, N., Grabczewski, K. (2006). Learning Machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35487-1

  • Online ISBN: 978-3-540-35488-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics