Abstract
Learning from data may be a very complex task. To satisfactorily solve a variety of problems, many different types of algorithms may need to be combined. Feature extraction algorithms are valuable tools, which prepare data for other learning methods. To estimate their usefulness one must examine the whole complex processes they are parts of.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Adamczak, W. Duch, and N. Jankowski. New developments in the feature space mapping model. In Third Conference on Neural Networks and Their Applications, pages 65–70, Kule, Poland, 1997. Polish Neural Networks Society.
A. Ben-Hur, D. Horn, H.T. Siegelman, and V. Vapnik. Suppor vector clustering. Journal of Machine Learning Research, 2:125–137, 2001.
K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.
C.M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, London, UK, 1995.
C.M. Bishop, M. Svensén, and C.K.I. Williams. EM optimization of latent-variable density models. In Advances in Neural Information Processing Systems, volume 8. MIT Press, Cambridge, MA, 1996.
L. Bobrowski and M. Krtowski. Induction of multivariate decision trees by using dipolar criteria. In D. A. Zighed, J. Komorowski, and J. M. ytkow, editors, Principles of data mining and knowledge discovery: 5th European Conference: PKDD’2000, pages 331–336, Berlin, 2000. Springer Verlag.
B.E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM, 1992.
L. Breiman. Bias, variance, and arcing classifiers. Technical Report Technical Report 460, Statistics Department, University of California, Berkeley, CA 94720, April 1996.
L. Breiman. Bias-variance, regularization, instability and stabilization. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 27–56. Springer, 1998.
L. Breiman, J. H. Friedman, A. Olshen, and C. J. Stone. Classification and regression trees. Wadsworth, Belmont, CA, 1984.
W. Buntine. Learning classification trees. In D. J. Hand, editor, Artificial Intelligence frontiers in statistics, pages 182–201. Chapman & Hall, London, 1993. URL citeseer.nj.nec.com/buntine91learning.html.
C. Campbell and C.V. Perez. Target switching algorithm: a constructive learning procedure for feed-forward neural networks. Neural Networks, pages 1221–1240, 1995.
B. Cestnik. Estimating probabilities: A crucial task in machine learning. In Proceedings of the Ninth European Conference on Artificial Intelligence, pages 147–149, 1990.
S. Chen, S.A. Billings, and W. Luo. Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control, 50:1873–1896, 1989.
V. Cherkassky and F. Mulier. Learning from data. Adaptive and learning systems for signal processing, communications and control. John Wiley & Sons, Inc., New York, 1998.
C. Cortes and V. Vapnik. Soft margin classifiers. Machine Learning, 20:273–297, 1995.
T.M. Cover and P.E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, 1967.
G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2:303–314, 1989.
J. P. Marques de Sá. Pattern Recognition. Concepts, Methods and Applications. Springer Verlag, 2001.
W. Duch. Similarity based methods: a general framework for classification, approximation and association. Control and Cybernetics, 29:937–968, 2000.
W Duch and K. Grabczewski. Heterogeneous adaptive systems. In Proceedings of the World Congress of Computational Intelligence, Honolulu, May 2002.
W. Duch and N. Jankowski. Survey of neural transfer functions. Neural Computing Surveys, 2:163–212, 1999.
W. Duch, R. Adamczak, and K. Grabczewski. Extraction of logical rules from backpropagation networks. Neural Processing Letters, 7:1–9, 1998.
W. Duch, Ł. Itert, and K. Grudziński. Competent undemocratic committees. In L. Rutkowski and J. Kacprzyk, editors, 6th International Conference on Neural Networks and Soft Computing, pages 412–417, Zakopane, Poland, 2002. Springer-Verlag.
R. O. Duda, P. E. Hart, and D. G. Stork. Patter Classification. John Wiley and Sons, New York, 2001.
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
S.E. Fahlman. Fast-learning variations on back-propagation: An empirical study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pages 38–51, Pittsburg, 1989. Morgan Kaufmann, San Mateo.
S.E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 524–532, Denver, CO, 1990. Morgan Kaufmann, San Mateo.
E. Fiesler. Comparative bibliography of ontogenic neural networks. In International Conference on Artificial Neural Networks, pages 793–796, 1994.
W. Finnoff, F. Hergert, and H.G. Zimmermann. Improving model detection by nonconvergent methods. Neural Networks, 6(6):771–783, 1993.
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188, 1936. Reprinted in Contributions to Mathematical Statistics, John Wiley & Sons, New York, 1950.
R. Fletcher and C.M. Reeves. Function minimization by conjugate gradients. Computer journal, 7:149–154, 1964.
M. Frean. Small nets and short paths: optimizing neural computation. PhD thesis, Center for cognitive science. University of Edinburgh, 1990.
Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In Machine Learning: Proceedings of Thirteenth International Conference, pages 148–156, 1996.
Y. Freund and R.E. Schapire. A decision theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science, 55(1):119–139, 1997.
J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.
G.M. Fung and O.L. Mangasarian. A feature selection newton method for support vector machine classification. Comput. Optim. Appl., 28(2):185–202, 2004. ISSN 0926-6003. doi: http://dx.doi.org/10.1023/B:COAP.0000026884.66338.df.
K. Grabczewski and W. Duch. A general purpose separability criterion for classification systems. In Proceedings of the 4th Conference on Neural Networks and Their Applications, pages 203–208, Zakopane, Poland, June 1999.
M. Grochowski and N. Jankowski. Comparison of instances seletion algorithms II: Algorithms survey. In Artificial Intelligence and Soft Computing, pages 598–603, 2004.
I. Guyon and D.G. Stork. Advances in large margin classifiers, chapter Linear discriminant and support vector classiers, pages 147–169. MIT Press, 2000.
I. Guyon, J. Weston, and S. Barnhilland V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 2002.
E.J. Hartman, J.D. Keeler, and J.M. Kowalski. Layered neural networks with gaussian hidden units as universal approximations. Neural Computation, 2:210–215, 1990.
B. Hassibi and D.G. Stork. Second order derivatives for network pruning: Optimal brain surgeon. In C.L. Giles, S.J. Hanson, and J.D. Cowan, editors, Advances in Neural Information Processing Systems 5, pages 164–171, San Mateo, CA, 1993. Morgan Kaufmann.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
S. Haykin. Neural Networks — A Comprehensive Foundation. Maxwell MacMillian Int., New York, 1994.
G.E. Hinton. Learning translation invariant in massively parallel networks. In J.W. de Bakker, A.J. Nijman, and P.C. Treleaven, editors, Proceedings of PARLE Conference on Parallel Architectures and Languages Europe, pages 1–13, Berlin, 1987. Springer-Verlag.
K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2:359–366, 1989.
R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton. Adaptive mixture of local experts. Neural Computation, 3:79–87, 1991.
N. Jankowski and W. Duch. Optimal transfer function neural networks. In 9th European Symposium on Artificial Neural Networks, pages 101–106, Bruges, Belgium, 2001.
N. Jankowski and V. Kadirkamanathan. Statistical control of RBF-like networks for classification. In 7th International Conference on Artificial Neural Networks, pages 385–390, Lausanne, Switzerland, October 1997. Springer-Verlag.
N. Jankowski, K. Grabczewski, and W. Duch. Ghostminer 3.0. FQS Poland, Fujitsu, Kraków, Poland, 2003.
Norbert Jankowski and Krzysztof Grabczewski. Heterogenous committees with competence analysis. In Proceedings of the Fifth International conference on Hybrid Intelligent Systems.
T. Joachims. Advances in kernel methods — support vector learning, chapter Making large-scale SVM learning practical. MIT Press, Cambridge, MA, 1998.
G. H. John and P. Langley. Estimating continuous distributions in bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, San Mateo, 1995. Morgan Kaufmann Publishers.
M.I. Jordan and R.A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6:181–214, 1994.
R. Kohavi. Wrappers for performance enhancement and oblivious decision graphs. PhD thesis, Stanford University, 1995.
R. Kohavi, D. Sommerfield, and J. Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools with Artificial Intelligence, pages 234–245. IEEE Computer Society Press, 1996. http://www.sgi.com/tech/mlc.
R. Kohavi, B. Becker, and D. Sommerfield. Improving simple bayes. In Proceedings of the European Conference on Machine Learning, 1997. URL citeseer.nj. nec.com/kohavi97improving.html.
T. Kohonen. Self-organizing maps. Springer, Heidelberg Berlin, 1995.
S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:76–86, 1951.
L.I. Kuncheva. Combining Pattern Classifiers. Methods and Algorithms. Wiley-Interscience, 2004.
Y. LeCun, J.S. Denker, and S.A. Solla. Optimal brain damage. In D.S. Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 598–605, Denver, CO, 1990. Morgan Kaufmann, San Mateo.
K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly Journal of Applied Mathematics, II(2):164–168, 1944.
W.-Y. Loh and Y.-S. Shih. Split selection methods for classification trees. Statistica Sinica, 7:815–840, 1997.
W.-Y. Loh and N. Vanichsetakul. Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association, 83:715–728, 1988.
R. Maclin. Boosting classifiers regionally. In Proceeding of AAAI, 1998.
D.W. Marquardt. An algorithm for least-squares estimation of non-linear parameters. Journal of the Society of Industrial and Applied Mathematics, 11(2): 431–441, 1963.
W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5:115–133, 1943.
M. Mézard and J.-P. Nadal. Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A, 22:2191–2204, 1989.
T. Mitchell. Machine learning. McGraw Hill, 1997.
W. Müller and F. Wysotzki. Automatic construction of decision trees for classification. Annals of Operations Research, 52:231–247, 1994.
S. K. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2:1–32, August 1994.
R.M. Neal. Bayesian Learning for Neural Networks. Number 118 in Lecture Notes in Statistics. Springer-Verlag, 1996.
M. Orr. Introduction to radial basis function networks. Technical report, Centre for Cognitive Science, University of Edinburgh, 1996.
E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An application to face detection. In CVPR’97, pages 130–136, New York, NY, 1997. IEEE.
J. Park and I.W. Sandberg. Universal approximation using radial-basis-function networks. Neural Computation, 3(2):246–257, 1991.
Z. Pawlak. Rough sets. International Journal of Computer and Information Sciences, 11(5):341–356, 1982.
J. Platt. A resource-allocating network for function interpolation. Neural Computation, 3(2):213–225, 1991.
J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning. MIT Press, Cambridge, MA., 1998.
T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, 1990.
F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. Technical Report IS-00-04, Stern School of Business, New York University, 2000.
J. Quinlan. Programs for machine learning, 1993.
J. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
J. R. Quinlan and R. M. Cameron-Jones. Oversearching and layered search in empirical learning. In IJCAI, pages 1019–1024, 1995. URL citeseer.nj.nec. com/quinlan95oversearching.html.
B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.
J. Rissanen. Modeling by shortest data description. Automatica, 14:445–471, 1978.
F. Rosenblatt. Principles of Neurodynamics. Spartan, New York, 1962.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In J. L. McCleland D. E. Rumelhart, editor, Parallel Distributed Processing: Explorations in Microstructure of Congnition, volume 1: Foundations, pages 318–362. Cambridge, 1986.
R. Schalkoff. Pattern Recognition: statistical, structural and neural approaches. Wiley, 1992.
R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998.
B. Schölkopf and A.J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2001.
B. Schölkopf, A.J. Smola, R.C. Williamson, and P.L. Bartlett. New support vector algorithms. Neural Computation, 12:1207–1245, 2000.
B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001.
C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, Urbana, 1949.
S.K. Shevade, S.S. Keerthi, C. Bhattacharyya, and K.R.K. Murthy. Improvements to the SMO algorithm for SVM regression. IEEE Transactions on Neural Networks, 11:1188–1194, Sept. 2000.
A.N. Tikhonov. On solving incorrectly posed problems and method of regularization. Doklady Akademii Nauk USSR, 151:501–504, 1963.
A.N. Tikhonov and V.Y. Arsenin. Solutions of Ill-posed Problems. W.H. Winston, Washington D.C., 1977.
P. E. Utgoff and C. E. Brodley. Linear machine decision trees. Technical Report UMCS-1991-010, Department of Computer Science, University of Massachusetts,, 1991. URL citeseer.nj.nec.com/utgoff91linear.html.
A.S. Weigend, D.E. Rumelhart, and B.A. Huberman. Back-propagation, weight elimination and time series prediction. In Proceedings of the 1990 Connectionist Models Summer School, pages 65–80, Los Altos, Palo Alto, San Francisco, 1990. Morgan Kaufmann, San Mateo.
A.S. Weigend, D.E. Rumelhart, and B.A. Huberman. Generalization by weight elimination with application to forecasting. In Advances in Neural Information Processing Systems 3, pages 875–882, Los Altos, Palo Alto, San Francisco, 1991. Morgan Kaufmann, San Mateo.
P.J. Werbose. Beyond regression: New tools for prediction and analysis in the bahavioral sciences. PhD thesis, Harvard Univeristy, Cambridge, MA, 1974.
J. Weston, A. Elisseeff, and B. Schölkopf. Use of the λ 0-norm with linear models and kernel methods. Technical report, Biowulf Technologies, 2001.
D.R. Wilson and T.R. Martinez. Reduction techniques for instance-based learning algorithms. Machine Learning, 38:257–286, 2000.
D.R. Wilson and T.R. Martinez. Instance pruning techniques. In 14th International Conference on Machine Learning, pages 403–411. Morgan Kaufmann, 1997. URL citeseer.nj.nec.com/wilson97instance.html.
D.H. Wolpert. Stacked generalization. Neural Networks, 5:241–249, 1992.
B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In Proc. 18th International Conf. on Machine Learning, pages 609–616. Morgan Kaufmann, San Francisco, CA, 2001. URL citeseer.nj.nec.com/zadrozny01obtaining.html.
J.M. Zurada. Artificial neural systems. West Publishing Company, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Jankowski, N., Grabczewski, K. (2006). Learning Machines. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-35488-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)