Abstract
Meta-learning has many aspects, but its final goal is to discover in an automatic way many interesting models for a given data. Our early attempts in this area involved heterogeneous learning systems combined with a complexity-guided search for optimal models, performed within the framework of (dis)similarity based methods to discover “knowledge granules”. This approach, inspired by neurocognitive mechanisms of information processing in the brain, is generalized here to learning based on parallel chains of transformations that extract useful information granules and use it as additional features. Various types of transformations that generate hidden features are analyzed and methods to generate them are discussed. They include restricted random projections, optimization of these features using projection pursuit methods, similarity-based and general kernel-based features, conditionally defined features, features derived from partial successes of various learning algorithms, and using the whole learning models as new features. In the enhanced feature space the goal of learning is to create image of the input data that can be directly handled by relatively simple decision processes. The focus is on hierarchical methods for generation of information, starting from new support features that are discovered by different types of data models created on similar tasks and successively building more complex features on the enhanced feature spaces. Resulting algorithms facilitate deep learning, and also enable understanding of structures present in the data by visualization of the results of data transformations and by creating logical, fuzzy and prototype-based rules based on new features. Relations to various machine-learning approaches, comparison of results, and neurocognitive inspirations for meta-learning are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Walker, S.F.: A brief history of connectionism and its psychological implications. In: Clark, A., Lutz, R. (eds.) Connectionism in Context, pp. 123–144. Springer, Berlin (1992)
Anderson, J.A., Rosenfeld, E.: Neurocomputing - foundations of research. MIT Press, Cambridge (1988)
Gerstner, W., Kistler, W.M.: Spiking Neuron Models. Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002)
Maass, W., Markram, H.: Theory of the computational function of microcircuit dynamics. In: Grillner, S., Graybiel, A.M. (eds.) Microcircuits. The Interface between Neurons and Global Brain Function, pp. 371–392. MIT Press, Cambridge (2006)
Maass, W., Natschläger, T., Markram, H.: Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural computation 14, 2531–2560 (2002)
Grossberg, S.: The complementary brain: Unifying brain dynamics and modularity. Trends in Cognitive Sciences 4, 233–246 (2000)
Smith, P.L., Ratcliff, R.: Psychology and neurobiology of simple decisions. Trends in Neurosciences 27, 161–168 (2004)
Jaeger, H., Maass, W., Principe, J.: Introduction to the special issue on echo state networks and liquid state machines. Neural Networks 20, 287–289 (2007)
Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1–127 (2009)
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 381–414 (2006)
Schölkopf, B., Smola, A.J.: Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Schapire, R., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 297–336 (1999)
Kuncheva, L.I.: Combining Pattern Classifiers. Methods and Algorithms. J. Wiley & Sons, New York (2004)
Duch, W., Itert, L.: Competent undemocratic committees. In: Rutkowski, L., Kacprzyk, J. (eds.) Neural Networks and Soft Computing, pp. 412–417. Springer, Heidelberg (2002)
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. Cognitive Technologies. Springer, Heidelberg (2009)
Newell, A.: Unified theories of cognition. Harvard Univ. Press, Cambridge (1990)
Duda, R.O., Hart, P.E., Stork, D.G.: Patter Classification. J. Wiley & Sons, New York (2001)
Vilalta, R., Giraud-Carrier, C.G., Brazdil, P., Soares, C.: Using meta-learning to support data mining. International Journal of Computer Science and Applications 1(1), 31–45 (2004)
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine learning, neural and statistical classification. Elis Horwood, London (1994)
Duch, W., Grudziński, K.: Meta-learning: searching in the model space. In: Proceedings of the International Conference on Neural Information Processing, Shanghai, pp. 235–240 (2001)
Duch, W., Grudziński, K.: Meta-learning via search combined with parameter optimization. In: Rutkowski, L., Kacprzyk, J. (eds.) Advances in Soft Computing, pp. 13–22. Springer, New York (2002)
Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Machine Learning 54, 194–197 (2004)
Sutton, C., McCullum, A.: An introduction to conditional random fields (2010)
Duch, W., Matykiewicz, P., Pestian, J.: Neurolinguistic approach to natural language processing with applications to medical text analysis. Neural Networks 21(10), 1500–1510 (2008)
Pedrycz, W.: Knowledge-Based Clustering: From Data to Information Granules. Wiley Interscience, Hoboken (2005)
Michalski, R.S. (ed.): Multistrategy Learning. Kluwer Academic Publishers, Dordrecht (1993)
Duch, W., Jankowski, N.: Survey of neural transfer functions. Neural Computing Surveys 2, 163–213 (1999)
Duch, W., Jankowski, N.: Transfer functions: hidden possibilities for better neural networks. In: 9th European Symposium on Artificial Neural Networks, pp. 81–94. De-facto publications, Brusells (2001)
Jankowski, N., Duch, W.: Optimal transfer function neural networks. In: 9th European Symposium on Artificial Neural Networks, pp. 101–106. De-facto publications, Bruges (2001)
Duch, W., Adamczak, R., Diercksen, G.: Constructive density estimation network based on several different separable transfer functions. In: 9th European Symposium on Artificial Neural Networks, Bruges, Belgium (April 2001)
Duch, W., Grąbczewski, K.: Heterogeneous adaptive systems. In: IEEE World Congress on Computational Intelligence, pp. 524–529. IEEE Press, Honolulu (2002)
Grąbczewski, K., Duch, W.: Heterogeneous forests of decision trees. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 504–509. Springer, Heidelberg (2002)
Wieczorek, T., Blachnik, M., Duch, W.: Influence of probability estimation parameters on stability of accuracy in prototype rules using heterogeneous distance functions. Artificial Intelligence Studies 2, 71–78 (2005)
Wieczorek, T., Blachnik, M., Duch, W.: Heterogeneous distance functions for prototype rules: influence of parameters on probability estimation. International Journal of Artificial Intelligence Studies 1 (2006)
Ullman, S.: High-level vision: Object recognition and visual cognition. MIT Press, Cambridge (1996)
Haykin, S.: Neural Networks - A Comprehensive Foundation. Maxwell MacMillian Int., New York (1994)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Duch, W.: Similarity based methods: a general framework for classification, approximation and association. Control and Cybernetics 29, 937–968 (2000)
Duch, W., Adamczak, R., Diercksen, G.H.F.: Classification, association and pattern completion using neural similarity based methods. Applied Mathematics and Computer Science 10, 101–120 (2000)
Sonnenburg, S., Raetsch, G., Schaefer, C., Schoelkopf, B.: Large scale multiple kernel learning. Journal of Machine Learning Research 7, 1531–1565 (2006)
Duch, W., Adamczak, R., Grąbczewski, K.: A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Transactions on Neural Networks 12, 277–306 (2001)
Duch, W., Setiono, R., Zurada, J.: Computational intelligence methods for understanding of data. Proceedings of the IEEE 92(5), 771–805 (2004)
Duch, W.: Towards comprehensive foundations of computational intelligence. In: Duch, W., Mandziuk, J. (eds.) Challenges for Computational Intelligence, vol. 63, pp. 261–316. Springer, Heidelberg (2007)
Baggenstoss, P.M.: The pdf projection theorem and the class-specific method. IEEE Transactions on Signal Processing 51, 668–672 (2003)
Bengio, Y., Delalleau, O., Roux, L.N.: The curse of highly variable functions for local kernel machines. Advances in Neural Information Processing Systems 18, 107–114 (2006)
Bengio, Y., Monperrus, M., Larochelle, H.: Non-local estimation of manifold structure. Neural Computation 18, 2509–2528 (2006)
Duch, W.: K-separability. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 188–197. Springer, Heidelberg (2006)
Kosko, B.: Neural Networks and Fuzzy Systems. Prentice-Hall International, Englewood Cliffs (1992)
Duch, W.: Filter methods. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature extraction, foundations and applications, pp. 89–118. Physica Verlag/Springer, Heidelberg (2006)
Duch, W., Adamczak, R., Hayashi, Y.: Eliminators and classifiers. In: Lee, S.Y. (ed.) 7th International Conference on Neural Information Processing (ICONIP), Dae-jong, Korea, pp. 1029–1034 (2000)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–91 (1993)
Grochowski, M., Duch, W.: Projection Pursuit Constructive Neural Networks Based on Quality of Projected Clusters. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008, Part II. LNCS, vol. 5164, pp. 754–762. Springer, Heidelberg (2008)
Jordan, M.I., Sejnowski, T.J.: Graphical Models. Foundations of Neural Computation. MIT Press, Cambridge (2001)
Jones, C., Sibson, R.: What is projection pursuit. Journal of the Royal Statistical Society A 150, 1–36 (1987)
Friedman, J.: Exploratory projection pursuit. Journal of the American Statistical Association 82, 249–266 (1987)
Webb, A.R.: Statistical Pattern Recognition. J. Wiley & Sons, Chichester (2002)
Hastie, T., Tibshirani, J., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley & Sons, New York (2001)
Cichocki, A., Amari, S.: Adaptive Blind Signal and Image Processing. Learning Algorithms and Applications. J. Wiley & Sons, New York (2002)
Pękalska, E., Duin, R.: The dissimilarity representation for pattern recognition: foundations and applications. World Scientific, Singapore (2005)
Grąbczewski, K., Duch, W.: The separability of split value criterion. In: Proceedings of the 5th Conf. on Neural Networks and Soft Computing, pp. 201–208. Polish Neural Network Society, Zakopane (2000)
Torkkola, K.: Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research 3, 1415–1438 (2003)
Tebbens, J.D., Schlesinger, P.: Improving implementation of linear discriminant analysis for the small sample size problem. Computational Statistics & Data Analysis 52, 423–437 (2007)
Gorsuch, R.L.: Factor Analysis. Erlbaum, Hillsdale (1983)
Gifi, A.: Nonlinear Multivariate Analysis. Wiley, Boston (1990)
Srivastava, A., Liu, X.: Tools for application-driven linear dimension reduction. Neurocomputing 67, 136–160 (2005)
Kordos, M., Duch, W.: Variable Step Search MLP Training Method. International Journal of Information Technology and Intelligent Computing 1, 45–56 (2006)
Bengio, Y., Delalleau, O., Roux, N.L.: The curse of dimensionality for local kernel machines. Technical Report Technical Report 1258, Dṕartement d’informatique et recherche opérationnelle, Universite de Montreal (2005)
Tsang, I.W., Kwok, J.T., Cheung, P.M.: Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research 6, 363–392 (2005)
Chapelle, O.: Training a support vector machine in the primal. Neural Computation 19, 1155–1178 (2007)
Tipping, M.E.: Sparse Bayesian Learning and the Relevance Vector Machine. Journal of Machine Learning Research 1, 211–244 (2001)
Lee, Y., Mangasarian, O.L.: Ssvm: A smooth support vector machine for classification. Computational Optimization and Applications 20, 5–22 (2001)
Maszczyk, T., Duch, W.: Support feature machines: Support vectors are not enough. In: World Congress on Computational Intelligence, pp. 3852–3859. IEEE Press, Los Alamitos (2010)
Pao, Y.H.: Adaptive Pattern Recognition and Neural Networks. Addison-Wesley, Reading, MA (1989)
Macias, J.A., Sierra, A., Corbacho, F.: Evolution of functional link networks. IEEE Transactions on Evolutionary Computation 5, 54–65 (2001)
Leung, H., Haykin, S.: Detection and estimation using an adaptive rational function filters. IEEE Transactions on Signal Processing 12, 3365–3376 (1994)
Duch, W., Adamczak, R., Diercksen, G.H.F.: Neural networks in non-euclidean spaces. Neural Processing Letters 10, 201–210 (1999)
Duch, W., Adamczak, R., Diercksen, G.H.F.: Distance-based multilayer perceptrons. In: Mohammadian, M. (ed.) International Conference on Computational Intelligence for Modelling Control and Automation, pp. 75–80. IOS Press, Amsterdam (1999)
Duch, W., Diercksen, G.H.F.: Feature space mapping as a universal adaptive system. Computer Physics Communications 87, 341–371 (1995)
Cox, T., Cox, M.: Multidimensional Scaling, 2nd edn. Chapman and Hall, Boca Raton (2001)
Thompson, R.: The Brain. The Neuroscience Primer. W.H. Freeman and Co, New York (1993)
Breiman, L.: Bias-variance, regularization, instability and stabilization. In: Bishop, C.M. (ed.) Neural Networks and Machine Learning, pp. 27–56. Springer, Heidelberg (1998)
Avnimelech, R., Intrator, N.: Boosted mixture of experts: An ensemble learning scheme. Neural Computation 11, 483–497 (1999)
Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting and variants. Machine learning 36, 105–142 (1999)
Maclin, R.: Boosting classifiers regionally. In: Proc. 15th National Conference on Artificial Intelligence, Madison, WI, pp. 700–705 (1998),
Duch, W., Itert, L.: Committees of undemocratic competent models. In: Rutkowski, L., Kacprzyk, J. (eds.) Proc. of Int. Conf. on Artificial Neural Networks (ICANN), Istanbul, pp. 33–36 (2003)
Giacinto, G., Roli, F.: Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognition 34, 179–181 (2001)
Bakker, B., Heskes, T.: Task clustering and gating for bayesian multitask learning. Journal of Machine Learning Research 4, 83–99 (2003)
Smyth, P., Wolpert, D.: Linearly combining density estimators via stacking. Machine Learning 36, 59–83 (1999)
Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Schwenker, F., Kestler, H., Palm, G.: Three learning phases for radial-basis-function networks. Neural Networks 14, 439–458 (2001)
Duch, W., Maszczyk, T.: Almost random projection machine. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 789–798. Springer, Heidelberg (2009)
Rutkowski, L.: Flexible Neuro-Fuzzy Systems. Kluwer Academic Publishers, Dordrecht (2004)
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Kégl, B., Krzyzak, A.: Piecewise linear skeletonization using principal curves. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 59–74 (2002)
Shoujue, W., Jiangliang, L.: Geometrical learning, descriptive geometry, and biomimetic pattern recognition. Neurocomputing 67, 9–28 (2005)
Huang, G., Chen, L., Siew, C.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks 17, 879–892 (2006)
Miettinen, K.: Nonlinear Multiobjective Optimization. Kluwer Academic Publishers, Dordrecht (1999)
Maszczyk, T., Duch, W.: Support vector machines for visualization and dimensionality reduction. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008, Part I. LNCS, vol. 5163, pp. 346–356. Springer, Heidelberg (2008)
Maszczyk, T., Grochowski, M., Duch, W.: Discovering Data Structures using Meta-learning, Visualization and Constructive Neural Networks. In: Koronacki, J., Ras, Z.W., Wierzchon, S.T., Kacprzyk, J. (eds.). Advances in Machine Learning II. SCI, vol. 262, pp. 467–484. Springer, Heidelberg (2010)
Grochowski, M., Duch, W.: Learning Highly Non-separable Boolean Functions Using Constructive Feedforward Neural Network. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 180–189. Springer, Heidelberg (2007)
Grabczewski, K., Jankowski, N.: Versatile and efficient meta-learning architecture: Knowledge representation and management in computational intelligence. In: IEEE Symposium on Computational Intelligence in Data Mining, pp. 51–58. IEEE Press, New York (2007)
Grąbczewski, K., Jankowski, N.: Meta-learning with machine generators and complexity controlled exploration. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 545–555. Springer, Heidelberg (2008)
Abu-Mostafa, Y.S.: Learning from hints in neural networks. Journal of Complexity 6, 192–198 (1989)
Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 640–646. MIT Press, Cambridge (1996)
Caruana, R., Pratt, L., Thrun, S.: Multitask learning. Machine Learning 28, 41 (1997)
Wu, P., Dietterich, T.G.: Improving svm accuracy by training on auxiliary data sources. In: ICML (2004)
Daumé III, H., Marcu, D.: Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research 26, 101–126 (2006)
Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 713–720 (2006)
Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: Transfer learning from unlabeled data. In: ICML 2007: Proceedings of the 24th International Conference on Machine learning (2007)
Dai, W., Jin, O., Xue, G.R., Yang, Q., Yu, Y.: Eigentransfer: a unified framework for transfer learning. In: ICML, p. 25 (2009)
Duch, W., Maszczyk, T.: Universal learning machines. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009. LNCS, vol. 5864, pp. 206–215. Springer, Heidelberg (2009)
Golub, T.R.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Wolberg, W.H., Mangasarian, O.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, U.S.A. 87, 9193–9196 (1990)
Jäkel, F., Schölkopf, B., Wichmann, F.A.: Does cognitive science need kernels? Trends in Cognitive Sciences 13(9), 381–388 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Duch, W., Maszczyk, T., Grochowski, M. (2011). Optimal Support Features for Meta-Learning. In: Jankowski, N., Duch, W., Gra̧bczewski, K. (eds) Meta-Learning in Computational Intelligence. Studies in Computational Intelligence, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20980-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-20980-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20979-6
Online ISBN: 978-3-642-20980-2
eBook Packages: EngineeringEngineering (R0)