Abstract
We introduce a new approach to the training of classifiers for performance on multiple tasks. The proposed hybrid training method leads to improved generalization via a better low-dimensional representation of the problem space. The quality of the representation is assessed by embedding it in a 2D space using multidimensional scaling, allowing a direct visualization of the results. The performance of the approach is demonstrated on a highly nonlinear image classification task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baxt, W. G. and White, H. (1995). Bootstrapping confidence intervals for clinical input variable effects in network trained to identify the presence of acute myocardial infraction. Neural Computation, 7(3): 624–638.
Baxter, J. (1995). Learning internal representations. In Proc. COLT’95.
Bellman, R. E. (1961). Adaptive Control Processes. Princeton University Press, Princeton, NJ.
Borg, I. and Lingoes, J. (1987). Multidimensional Similarity Structure Analysis. Springer, Berlin.
Breiman, L. (1992). Stacked regression. Technical Report TR-367, Department of Statistics, University of California, Berkeley.
Breiman, L. (1994). Bagging predictors. Technical Report TR-421, Department of Statistics, University of California, Berkeley.
Brigham, J. C. (1986). The influence of race on face recognition. In Ellis, H. D., Jeeves, M. A., and Newcombe, F., editors, Aspects of face processing, pages 170–177. Martinus Nijhoff, Dordrecht.
Caruana, R. (1993). Multitask connectionist learning. In Proceedings of the 1993 Connectionist Models Summer School, pages 372–379, San Mateo, CA.
Caruana, R. (1995). Learning many related tasks at the same time with backpropagation. In Tesauro, G., Touretzky, D., and Leen, T, editors, Advances in Neural Information Processing Systems, volume 7, pages 657–664. Morgan Kaufmann, San Mateo, CA.
Cutzu, F. and Edelman, S. (1995). Explorations of shape space. CS-TR 95-01, Weizmann Institute of Science.
Edelman, S. (1995a). Representation of similarity in 3D object discrimination. Neural Computation, 7:407–422.
Edelman, S. (1995b). Representation, Similarity, and the Chorus of Prototypes. Minds and Machines, 5:45–68.
Efron, B. and Tibshirani, R. (1993). An introduction to the bootstrap. Chapman and Hall, London.
Gasser, M. (1995). Transfer in a connectionist model of the acquisition of morphology.CogSci TR 147, Indiana University, Bloomington, IN. an expanded version of a paper presented at the Morphology Workshop, Nijmegen, June 13, 1995.
Grossman, T. and Lapedes, A. (1993). Use of bad training data for better prediction. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems, volume 6, pages 342–350. Morgan Kaufmann.
Hintzman, D. L. (1994). Twenty-five years of learning and memory: was the cognitive revolution a mistake? In Umiltá, C. and Moscovitch, M., editors, Attention and Performance, volume XV, chapter 16, pages 360–391. MIT Press.
Hofmann, T. and Buhmann, J. (1994). Multidimensional scaling and data clustering. In J. D. Cowan, G. T. and Alspector, J., editors, Neural Information Processing Systems, volume 7, pages 459–466. Morgan Kaufmann.
Huber, P. J. (1985). Projection pursuit (with discussion). The Annals of Statistics, 13:435–475.
Intrator, N. (1993). Combining exploratory projection pursuit and projection pursuit regression with application to neural networks. Neural Computation, 5(3):443–455.
Intrator, N. and Cooper, L. N. (1991) Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions. Neural Networks, 5, 3–17
Kramer, A. F., Strayer, D. L., and Buckley, J. (1990). Development and transfer of automatic processing. Journal of Experimental Psychology: Human Perception and Performance, 16:505–522.
Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1): 1–27.
Lando, M. and Edelman, S. (1995). Receptive field spaces and class-based generalization from a single view in face recognition. Network, 6:551–576.
LeBlanc, M. and Tibshirani, R. (1994). Combining estimates in regression and classification. Preprint.
Logan, G. (1988). Towards an instance theory of automatization. Psychological Review, 95:492–527.
Maddox, W. T. and Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization. Perception and Psychophysics, 53:49–70.
Martin, G. (1988). The effects of old learning on new in hopfield and backpropagation nets. Technical Report ACA-HI-019, Microelectronics and Computer Technology Corporation (MCC).
McLaren, I. P. L., Leevers, H. J., and Mackintosh, N. J. (1994). Recognition, categorization, and perceptual learning (or, how learning to classify things together helps one to tell them apart). In Umiltá, C. and Moscovitch, M., editors, Attention and Performance, volume XV, chapter 35, pages 889–909. MIT Press.
Murre, J. M. J. (1995). Transfer of learning in backpropagation networks and in related neural network models. In Levy, Bairaktaris, Bullinaria, and Cairns, editors, Connectionist Models of Memory and Language. UCL Press, London. To appear.
Pickover, C. (1990). Computers, Pattern, Chaos, and Beauty. St. Martin’s Press.
Pratt, L. Y. (1993). Transferring previously learned back-propagation neural networks to new learning tasks. Technical report ml-tr-37, Rutgers University, CS Dept.
Price, D., Knerr, S., Personnaz, L., and Dreyfus, G. (1995). Pairwise neural network classifiers with probabilistic outputs. In G. Tesauro, D. S. T. and Leen, T. K., editors,Advances in Neural Information Processing 7, pages 1109–1116. MIT Press.
Raviv, Y. and Intrator, N. (1996). Bootstrapping with noise: An effective regularization technique. Connection Science, Special issue on Combining Estimators, 8:356–372.
Reder, L. and Klatzky, R. L. (1994). Transfer: training for performance. In Druckman, D. and Bjork, R. A., editors, Learning, remembering, believing: enhancing human performance, chapter 3, pages 25–56. National Academy Press, Washington, DC. Also available as TR CMU-CS-94-187; The effect of context on training: is learning situated?
Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Trans. Compute18:401–409.
Sas (1989). SAS/STAT User’s Guide, Version 6. SAS Institute Inc., Cary, NC.
Shepard, R. N. (1966). Metric structures in ordinal data. J. Math. Psychology, 3:287–315.
Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering. Science, 210:390–397.
Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237:1317–1323.
Simard, P., Victorri, B., LeCun, Y., and Denker, J. (1992). Tangent prop-a formalism for specifying selected invariances in an adaptive network. In Moody, J., Lippman, R., and Hanson, S. J., editors, Neural Information Processing Systems, volume 4, pages895–903. Morgan Kaufmann, San Mateo, CA.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion). J. Royal Statistics Society B, 36:111–147.
Thrun, S. and Mitchell, T. (1995). Learning one more thing. In Mellish, C., editor, Proc. 14th IJCAI, volume 2, pages 1217–1223, San Mateo, CA. Morgan Kaufmann.
Young, G. and Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3:19–22.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer Science+Business Media New York
About this chapter
Cite this chapter
Intrator, N., Edelman, S. (1996). Making a Low-Dimensional Representation Suitable for Diverse Tasks. In: Thrun, S., Pratt, L. (eds) Learning to Learn. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5529-2_6
Download citation
DOI: https://doi.org/10.1007/978-1-4615-5529-2_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7527-2
Online ISBN: 978-1-4615-5529-2
eBook Packages: Springer Book Archive