Skip to main content

Learning to Learn: Introduction and Overview

  • Chapter
Learning to Learn

Abstract

Over the past three decades or so, research on machine learning and data mining has led to a wide variety of algorithms that learn general functions from experience. As machine learning is maturing, it has begun to make the successful transition from academic research to various practical applications. Generic techniques such as decision trees and artificial neural networks, for example, are now being used in various commercial and industrial applications (see e.g., [Langley, 1992; Widrow et al., 1994]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Y. S. Abu-Mostafa. A method for learning from hints. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 73–80, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  • W.-K. Ahn and W. F. Brewer. Psychological studies of explanation-based learning. In G. DeJong, editor, Investigating Explanation-Based Learning. Kluwer Academic Publishers, Boston/ Dordrecht/London, 1993.

    Google Scholar 

  • W.-K. Ahn, R. Mooney, W. F. Brewer, and G. F. DeJong. Schema acquisition from one example: Psychological evidence for explanation-based learning. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle, WA, July 1987.

    Google Scholar 

  • C. A. Atkeson. Using locally weighted regression for robot learning. In Proceedings of the 1991 IEEE International Conference on Robotics and Automation, pages 958–962, Sacramento, CA, April 1991.

    Google Scholar 

  • A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138, 1995.

    Article  Google Scholar 

  • J. Baxter. The Canonical Distortion Measure for Vector Quantization and Function Approximation. Chapter 7 in this book.

    Google Scholar 

  • J. Baxter. Learning Internal Representations. PhD thesis, Flinders University, Australia, 1995.

    Google Scholar 

  • D. Beymer and T. Poggio. Face recognition from one model view. In Proceedings of the International Conference on Computer Vision, 1995.

    Google Scholar 

  • A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occams razor. Information Processing Letters, 24:377–380, 1987.

    Article  MathSciNet  MATH  Google Scholar 

  • C.E. Brodley. Recursive Automatic Algorithm Selection for Inductive Learning. PhD thesis, University of Massachusetts, Amherst, MA 01003, August 1994. also available as COINS Technical Report 94-61.

    Google Scholar 

  • R. Caruana. Multitask learning: A knowledge-based of source of inductive bias. In P. E. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 41–48, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  • R. Caruana. Algorithms and applications for multitask learning. In L. Saitta, editor, Proceedings of the Thirteenth International Conference on Machine Learning, San Mateo, CA, July 1996. Morgan Kaufmann.

    Google Scholar 

  • R. Caruana and S. Baluja. Using the future to’ sort out’ the present: Rankprop and multitask learning for medical risk evaluation. In D. Touretzky, M. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, Cambridge, MA, 1996. MIT Press. to appear.

    Google Scholar 

  • R. Caruana, D.L. Silver, J. Baxter, T.M. Mitchell, L.Y. Pratt, and Thrun. S. Workshop on “Learning to learn: Knowledge consolidation and transfer in inductive systems”. Workshop, held at NIPS-95, Vail, CO, see World Wide Web at http://www.cs.cmu, December 1995.

  • N.L. Cramer. A representation for the adaptive generation of simple sequential programs. In J.J. Grefenstette, editor, Proceedings of First International Conference on Genetic Algorithms and their Applications, pages 183–187, Pittsburgh, PA, 1985.

    Google Scholar 

  • P. Dayan and G. E. Hinton. Feudal reinforcement learning. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 5, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  • L. DeRaedt, N. Lavrač, and S. Džeroski. Multiple predicate learning. In Proceedings of IJCAI-93, pages 1037–1042, Chamberry, France, July 1993. IJCAI, Inc.

    Google Scholar 

  • A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant. A general lower bound on the number of examples needed for learning. Information and Computation, 82:247–261, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  • R. Franke. Scattered data interpolation: Tests of some methods. Mathematics of Computation, 38(157):181–200, January 1982.

    MathSciNet  MATH  Google Scholar 

  • J. H. Friedman. Flexible metric nearest neighbor classification. November 1994.

    Google Scholar 

  • S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4:1–58, 1992.

    Article  Google Scholar 

  • T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. Submitted for publication, December 1994.

    Google Scholar 

  • H. Hild and A. Waibel. Multi-speaker/speaker-independent architectures for the multi-state time delay neural network. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages II 255–258. IEEE, April 1993.

    Google Scholar 

  • T. Hume and M.J. Pazzani. Learning sets of related concepts: A shared task model. In Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society, 1996.

    Google Scholar 

  • L. P. Kaelbling. Hierarchical learning in stochastic domains: Preliminary results. In P. E. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 167–173, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  • M. Kearns and U. Vazirani. Introduction to Computational Learning Theory. MIT Press, Cambridge, MA, 1994.

    Google Scholar 

  • J. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, 1992.

    MATH  Google Scholar 

  • J. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA, 1994.

    MATH  Google Scholar 

  • J. Laird, P. Rosenbloom, and A. Newell. Chunking in SOAR: The anatomy of a general learning mechanism. Machine Learning, 1(1): 11–46, 1986.

    Google Scholar 

  • M. Lando and S. Edelman. Generalizing from a single view in face recognition. Technical Report CS-TR 95-02, Department of Applied Mathematics and Computer Science, The Weizmann Institute of Science, Rehovot 76100, Israel, January 1995.

    Google Scholar 

  • P. Langley. Areas of application for machine learning. In Proceedings of the Fifth International Symposium on Knowledge Engineering, Sevilla, 1992.

    Google Scholar 

  • L.-J. Lin. Self-supervised Learning by Reinforcement and Artificial Neural Networks. PhD thesis, Carnegie Mellon University, School of Computer Science, Pittsburgh, PA, 1992.

    Google Scholar 

  • B. Mel. Seemore: A view-based approach to 3-d object recognition using multiple visual cues. In M.C. Mozer, D.S. Touretzky and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, December 1996.

    Google Scholar 

  • T. M. Mitchell. The need for biases in learning generalizations. Technical Report CBM-TR-117, Computer Science Department, Rutgers University, New Brunswick, NJ 08904, 1980. Also appeared in: Readings in Machine Learning, J. Shavlik and T.G. Dietterich (eds.), Morgan Kaufmann.

    Google Scholar 

  • T. M. Mitchell. Machine Learning. McGraw-Hill, NY, in preparation.

    Google Scholar 

  • T. M. Mitchell and S. Thrun. Explanation-based neural network learning for robot control. In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 287–294, San Mateo, CA, 1993. Morgan Kaufmann.

    Google Scholar 

  • R. J. Mooney and D. Ourston. A multistrategy approach to theory refinement. In R.S. Michalski and G. Teccuci, editors, Proceedings of the International Workshop on Multistrategy Learning, pages 207–214. Morgan Kaufmann, 1992.

    Google Scholar 

  • A. W. Moore. Efficient Memory-based Learning for Robot Control. PhD thesis, Trinity Hall, University of Cambridge, England, 1990.

    Google Scholar 

  • A. W. Moore, D. J. Hill, and M. P. Johnson. An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators. In S. Hanson, S. Judd, and T. Petsche, editors, Computational Learning Theory and Natural Learning Systems, Volume 3. MIT Press, 1992.

    Google Scholar 

  • Y. Moses, S. Ullman, and S. Edelman. Generalization across changes in illumination and viewing position in upright and inverted faces. Technical Report CS-TR 93-14, Department of Applied Mathematics and Computer Science, The Weizmann Institute of Science, Rehovot 76100, Israel, 1993.

    Google Scholar 

  • S. Muggelton. Inductive Logic Programming. Academic Press, New York, 1992.

    Google Scholar 

  • J. O’Sullivan. Integrating initialization bias and search bias in artificial neural networks. Internal report, January 1996.

    Google Scholar 

  • T. Poggio and T. Vetter. Recognition and structure from one 2d model view: Observations on prototypes, object classes and symmetries. A.I. Memo No. 1347, 1992.

    Google Scholar 

  • D. A. Pomerleau. Knowledge-based training of artificial neural networks for autonomous robot driving. In J. H. Connell and S. Mahadevan, editors, Robot Learning, pages 19–43. Kluwer Academic Publishers, 1993.

    Google Scholar 

  • L. Y. Pratt. Transferring Previously Learned Back-Propagation Neural Networks to New Learning Tasks. PhD thesis, Rutgers University, Department of Computer Science, New Brunswick, NJ 08904, May 1993. also appeared as Technical Report ML-TR-37.

    Google Scholar 

  • L.Y. Pratt and B. Jennings. A review of transfer between connectionist networks. Connection Science, 8(2): 163–184, 1996. Reprinted as Chapter 2 in this book.

    Article  Google Scholar 

  • J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239–266, 1990.

    Google Scholar 

  • L. Rendell, R. Seshu, and D. Tcheng. Layered concept-learning and dynamically-variable bias management. In Proceedings of IJCAI-87, pages 308–314, 1987.

    Google Scholar 

  • M. B. Ring. Two methods for hierarchy learning in reinforcement environments. In From Animals to Animals 2: Proceedins of the Second International Conference on Simulation of Adaptive Behavior, pages 148–155. MIT Press, 1993.

    Google Scholar 

  • M. B. Ring. Continual Learning in Reinforcement Environments. R. Oldenbourg Verlag, München, Wien, 1995.

    Google Scholar 

  • S.J. Russell. Prior knowledge and autonomous learning. Robotics and Autonomous Systems, 8:145–159, 1991.

    Article  Google Scholar 

  • J. H. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Technische Universität München, January 1995. Revised version.

    Google Scholar 

  • J.H. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Master’s thesis, Technische Universität München, München, Germany, 1987.

    Google Scholar 

  • J.H. Schmidhuber. A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In X. Yao, editor, Evolutionary Computation: Theory and Applications, Singapore, 1996. Scientific Publishing Co.

    Google Scholar 

  • N. E. Sharkey and A. J. C. Sharkey. Adaptive generalization and the transfer of knowledge. In Proceedings of the Second Irish Neural Networks Conference, Belfast, 1992.

    Google Scholar 

  • B. Silver. Using Meta-level inference to Constrain Search and to Learn Strategies in Equation Solving. PhD thesis, Department of Artificial Intelligence, University of Edinburgh, 1984.

    Google Scholar 

  • P. Simard, B. Victorri, Y. LeCun, and J. Denker. Tangent prop-a formalism for specifying selected invariances in an adaptive network. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 4, pages 895–903, San Mateo, CA, 1992. Morgan Kaufmann.

    Google Scholar 

  • S. P. Singh. Transfer of learning by composing solutions for elemental sequential tasks. Machine Learning, 8, 1992.

    Google Scholar 

  • C. Stanfìll and D. Waltz. Towards memory-based reasoning. Communications of the ACM, 29(12): 1213–1228, December 1986.

    Article  Google Scholar 

  • S. C. Suddarth and A. Holden. Symbolic neural systems and the use of hints for developing complex systems. International Journal of Machine Studies, 35, 1991.

    Google Scholar 

  • S. C. Suddarth and Y. L. Kergosien. Rule-injection hints as a means of improving network performance and learning time. In Proceedings of the EURASIP Workshop on Neural Networks, Sesimbra, Portugal, Feb 1990. EURASIP.

    Google Scholar 

  • R. S. Sutton. Adapting bias by gradient descent: An incremental version of delta-bar-delta. In Proceeding of Tenth National Conference on Artificial Intelligence AAAI-92, pages 171–176, Menlo Park, CA, July 1992. AAAI, AAAI Press/The MIT Press.

    Google Scholar 

  • R. S. Sutton, editor. Reinforcement Learning. Kluwer Academic Publishers, Boston, MA, 1992.

    Google Scholar 

  • A. Teller. Evolving programmers: The co-evolution of intelligent recombination operators. In P. Angeline and K. Kinnear, editors, Advances in Genetic Programming II, Cambridge, MA, 1996. MIT Press.

    Google Scholar 

  • A. Teller and M. Veloso. PADO: A new learning architecture for object recognition. In K. Ikeuchi and M. Veloso, editors, Symbolic Visual Learning. Oxford University Press, 1996.

    Google Scholar 

  • S. Thrun. Explanation-Based Neural Network Learning: A Lifelong Learning Approach. Kluwer Academic Publishers, Boston, MA, 1996.

    Book  MATH  Google Scholar 

  • S. Thrun and T. M. Mitchell. Integrating inductive neural network learning and explanationbased learning. In Proceedings of IJCAI-93, Chamberry, France, July 1993. IJCAI, Inc.

    Google Scholar 

  • S. Thrun and J. O’Sullivan. Discovering structure in multiple learning tasks: The TC algorithm. In L. Saitta, editor, Proceedings of the Thirteenth International Conference on Machine Learning, San Mateo, CA, July 1996. Morgan Kaufmann.

    Google Scholar 

  • S. Thrun and A. Schwartz. Finding structure in reinforcement learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. MIT Press.

    Google Scholar 

  • P. E. Utgoff. Machine Learning of Inductive Bias. Kluwer Academic Publishers, 1986.

    Google Scholar 

  • P. E. Utgoff. Shift of bias for inductive concept learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach, Volume II. Morgan Kaufmann, 1986.

    Google Scholar 

  • L. G. Valiant. A theory of the learnable. Communications of the ACM, 27:1134–1142, 1984.

    Article  MATH  Google Scholar 

  • V. Vapnik. Estimations of dependences based on statistical data. Springer Publisher, 1982.

    Google Scholar 

  • S. Whitehead, J. Karlsson, and J. Tenenberg. Learning multiple goal behavior via task decomposition and dynamic policy merging. In J. H. Connell and S. Mahadevan, editors, Robot Learning, pages 45–78. Kluwer Academic Publishers, 1993.

    Google Scholar 

  • B. Widrow, D. E. Rumelhart, and M. A. Lehr. Neural networks: Applications in industry, business and science. Communications of the ACM, 37(3):93–105, March 1994.

    Article  Google Scholar 

  • D. H. Wolpert. Off-training set error and a priori distinctions between learning algorithms. Technical Report SFI TR 95-01-003, Santa Fe Institute, Santa Fe, NM 87501, 1994.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Thrun, S., Pratt, L. (1998). Learning to Learn: Introduction and Overview. In: Thrun, S., Pratt, L. (eds) Learning to Learn. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5529-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5529-2_1

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7527-2

  • Online ISBN: 978-1-4615-5529-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics