Abstract
In many practical cases, only few labels are available on the data. Algorithms must then take advantage of the unlabeled data to ensure an efficient learning. This type of learning is called semi-supervised learning (SSL). In this article, we propose a methodology adapted to both the representation and the prediction of large datasets in that situation. For that purpose, groups of non-correlated attributes are created in order to overcome problems related to high dimensional spaces. An ensemble is then set up to learn each group with a self-organizing map (SOM). Beside the prediction, these maps also aim at providing a relevant representation of the data which could be used in semi-supervised learning. Finally, the prediction is achieved by a vote of the different maps. Experimentations are performed both in supervised and semi-supervised learning. They show the relevance of this approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Becker, S.: JPMAX: Learning to recognize moving objects as a model-fitting problem. In: Advances in Neural Information Processing Systems, vol. 7, pp. 933–940. MIT Press, Cambridge (1995)
Bellmann, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1975)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: COLT: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100. Morgan Kaufmann, San Francisco (1998)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Information Fusion 6(1), 5–20 (2005)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Demartines, P.: Analyse de données par réseaux de neurones auto-organisés. Ph.d. dissertation, Institut National Polytechnique de Grenoble, France (1994)
Duin, R., Tax, D.: Experiments with classifier combining rules. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 16–29. Springer, Heidelberg (2000)
Freund, Y.: Boosting a weak learning algorithm by majority. In: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, San Francisco (1990)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computing 4(1), 1–58 (1992)
Jacobs, R., Jordan, M., Barto, A.: Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. Cognitive Science 15, 219–250 (1991)
Kaiser, H.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23, 187–200 (1958)
Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Heidelberg (2001)
Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. Advances in NIPS 7, 231–238 (1995)
Leskes, B.: The Value of Agreement, a New Boosting Algorithm. Springer, Heidelberg (2005)
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Prudhomme, E., Lallich, S.: Quality measure based on Kohonen maps for supervised learning of large high dimensional data. In: Proc. of ASMDA 2005, pp. 246–255 (2005)
Rakotomalala, R.: Tanagra: un logiciel gratuit pour l’enseignement et la recherche. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 697–702. Springer, Heidelberg (2005)
Ruta, D., Gabrys, B.: Classifier selection for majority voting. Information Fusion 6, 63–81 (2005)
SAS, SAS/STAT user’s guide, vol. 2. SAS Institute Inc. (1989)
Tumer, K., Ghosh, J.: Theoretical foundations of linear and order statistics combiners for neural pattern classifiers. Technical report, Computer and Vision Research Center, University of Texas, Austin (1995)
Valentini, G., Masulli, F.: Ensembles of learning machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–20. Springer, Heidelberg (2002)
Verleysen, M., François, D., Simon, G., Wertz, V.: On the effects of dimensionality on data analysis with neural networks. In: International Work-Conference on ANNN: Computational Methods in Neural Modeling, vol. II, pp. 105–112. Springer, Heidelberg (2003)
Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of American Statistical Association 58(301), 236–244 (1963)
Zanda, M., Brown, G., Fumera, G., Roli, F.: Ensemble learning in linearly combined classifiers via negative correlation. In: International Workshop on Multiple Classifier Systems (2007)
Zhou, Y., Goldman, S.: Democratic co-learning. In: ICTAI, pp. 594–202 (2004)
Zhu, X.: Semi-supervised learning literature survey. Technical report (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prudhomme, E., Lallich, S. (2008). Maps Ensemble for Semi-Supervised Learning of Large High Dimensional Datasets. In: An, A., Matwin, S., Raś, Z.W., Ślęzak, D. (eds) Foundations of Intelligent Systems. ISMIS 2008. Lecture Notes in Computer Science(), vol 4994. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68123-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-68123-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68122-9
Online ISBN: 978-3-540-68123-6
eBook Packages: Computer ScienceComputer Science (R0)