Abstract
Inductive modeling or “machine learning” algorithms are able to discover structure in high-dimensional data in a nearly automated fashion. These adaptive statistical methods — including decision trees, polynomial networks, projection pursuit models, and additive networks — repeatedly search for, and add on, the model component judged best at that state. Because of the huge model space of possible components, the choice is typically greedy; that is, optimal only in the very short term. In fact, it is usual for the analyst and algorithm to be greedy at three levels: when choosing a 1) term within a model, 2) model within a family, and 3) family within a wide collection of methods. It is better, we argue, to “take a longer view” in each stage. For the first stage (term selection) examples are presented for classification using decision trees and estimation using regression. To improve the third stage (method selection) we propose fusing information from disparate models to make a combined model more robust. (Fused models merge their output estimates but also share information on, for example, variables to employ and cases to ignore.) Benefits of fusing are demonstrated on a challenging classification dataset, where the task is to infer the species of a bat from its chirps.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barron, A. R. (1984). Predicted Squared Error: A Criterion for Automatic Model Selection. Ch. 4 of (Farlow, 1984 )
Barron, R. L. & D. Abbott (1988). User of Polynomial Networks in Optimum, Real-time, Two-Point Boundary Value Guidance of Tactical Weapons,Proc. Military Comp. Conf. , Anaheim, CA, May 3–5.
Berk, K. N. (1978). Comparing Subset Regression Procedures, Technometrics, 20, no. 1: 1–6.
Breiman, L., J. H. Friedman, R. A. Olshen, & C. J. Stone (1984). Classification and Regression Trees. Wadsworth & Brooks, Pacific Grove, CA.
Cover, T. M. (1974). The Best Two Independent Measurements Are Not the Two Best. IEEE Trans. Systems, Man & Cybernetics, 4.
Desroachers, A. & S. Mohseni (1984). On Determining the Structure of a Non-Linear System, International Journal of Control, 40: 923–938.
Draper, N. R. & H. Smith (1966).Applied Regression Analysis. Wiley, New York.
Elder, J. F. IV (1985). User’s Manual: ASPN: Algorithm for Synthesis of Polynomial Networks (4th Ed., 1988 ). Barron Assoc. Inc., Stanardsville, VA.
Elder, J. F. IV (1990). Feature Elimination Using High-Order Correlation,Proc. Aerospace Applications of Artificial Intelligence, Dayton, OH, Oct. 29–31: 65–72.
Elder, J. F. IV (1993). Assisting Inductive Modeling through Visualization, Proc. Joint Statistical Mtg. , San Francisco, CA, Aug. 7–11.
Elder, J. F. IV & R. L. Barron (1988). Automated Design of Continuously-Adaptive Control: The “Super-Controller” Strategy for Reconfigurable Systems,Proc. American Control Conf. , Atlanta, GA, June 15–17.
Elder, J. F. IV & D. E. Brown (1992). Induction and Polynomial Networks, Univ. VA Tech. Report IPC-TR-92–9. (Forthcoming in 1995 as Chapter 3 in Advances in Control Networks and Large Scale Parallel Distributed Processing Models, Vol. 2. Ablex, Norwood, NJ.
Elder, J.F.IV & D. Pregibon (1995, in press) A Statistical Perspective on Knowledge Discovery in Databases, Chapter 4 in Advances in Knowledge Discovery and Data Mining, eds. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy, AAAI/MIT Press.
Farlow, S. J. (1984), Ed. Self-Organizing Methods in Modeling: GMDH Type Algorithms. Marcel Dekker.
Fulcher, G. E. & D. E. Brown (1991). A Polynomial Network for Predicting Temperature Distributions, Institute for Parallel Computation Tech. Report 91–008, Univ. VA.
Ivakhnenko, A. G. (1968). The Group Method of Data Handling — A Rival of the Method of Stochastic Approximation, Soviet Automatic Control, 3.
Lloyd, D. K., & M. Lipow (1962). Reliability: Management, Methods, and Mathematics. Prentice Hall, Englewood Cliffs: 360.
Mallows, C. L. (1973). Some Comments on Cp, Technometrics. 15: 661–675.
Miller, A. J. (1990).Subset Selection in Regression. Chapman and Hall, NY.
Mucciardi, A. N. (1982). ALN 4000 Ultrasonic Pipe Inspection System. Nondestructive Evaluation Program: Progress in 1981, EPRI Report NP-2088-SR, Jan.
Murthy, S. K., S. Kasif, & S. Salzberg (1994). A System for Induction of Oblique Decision Trees, Journal of Artificial Intelligence, 2: 1–32.
Prager, M. H. (1988). Group Method of Data Handling: A New Method for Stock Identification. Trans. American Fisheries Society, 117: 290–296.
Rissanen, J. (1978). Modeling by Shortest Data Description, Automatica, 14: 465–471.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Elder, J.F. (1996). Heuristic Search for Model Structure: the Benefits of Restraining Greed. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_13
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2404-4_13
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94736-5
Online ISBN: 978-1-4612-2404-4
eBook Packages: Springer Book Archive