A model for parameter setting based on Bayesian networks
Introduction
There are a variety of situations in which a researcher is faced with a modeling problem. Modeling is a process through which a model M (function or algorithm) is constructed to explain the behavior of the system and to predict unknown answers. Once the model of the studied system is established based on a set of parameters , the parameter values which make the model generate the best results should settle down.
In this work, we are interested in the parameter setting of models used in the artificial intelligence (AI) field, such as artificial neural networks, genetic algorithms (GAS), cluster algorithms and so on (Duda et al., 2001). Parameter setting of a model can be carried out either by applying specific mechanisms to the model (Friedrichs and Igel, 2004) or by applying general optimization techniques used in system modeling (Fletcher, 2000, Rao, 1996). In many other cases, the choice of the parameters involves consulting the specialized technical literature or simply resorting to a trial and error technique.
This work has utility in the engineering field, where a variety of AI models and techniques are used to solve a whole range of problems, but such techniques are not known in depth. Our work could help in these situations. As an example to illustrate its usefulness, a problem of graphic design was selected (the root identification problem).
In a deeper analysis, it can be distinguished three approaches to the problem of parameter setting in AI: the evolutionary approach (used by evolutionary algorithms: GAS (Goldberg, 1989); evolution strategies (Schwefel, 1995); and evolutionary programming (Fogel, 1999)), the model selection approach and the statistical approach.
The evolutionary approach is based on adapting the parameters during the evolutionary algorithm run. Parameter control techniques can be sub-divided into three types: deterministic, adaptive, and self-adaptive (Eiben et al., 1999). In deterministic control, the parameters are changed according to deterministic rules without using any feedback from the search. The adaptive control takes place when there is some form of feedback that influences the parameter specification. Examples of adaptive control are the works of Davis (1989), Julstrom (1995) and Smith and Smuda (1995). Finally, self-adaptive control is based on the idea that evolution can be also applied in the search for good parameter values. In this type of control, the operator probabilities are encoded together with the corresponding solution, and undergo recombination and mutation. Self-adaptive evolution strategies (Beyer and Schwefel, 2002) are an example of the application of this type of parameter control.
In the model selection approach, a model selection criterion can be used to select parameters, more generally to compare and choose among models which may have a different capacity or competence. When there is only a single parameter one can easily explore how its value affects the model selection criterion: typically one tries a finite number of values of the parameter and picks the one which gives the lowest value of the model selection criterion. Most model selection criteria have been proposed for selecting a single parameter that controls the “complexity” of the class of functions in which the learning algorithms find a solution, e.g., the structural risk minimization (Vapnik, 1982), the Akaike (1974) Information Criterion, or the generalized cross-validation criterion (Craven and Wahba, 1979). Another type of criteria are those based on held-out data, such as the cross-validation estimates of generalization error (Kohavi and John, 1995). These are almost unbiased estimates of generalization error (Vapnik, 1982) obtained by testing on data not used to choose parameter . This approach is less applicable when there are several parameters to simultaneously optimize. In this case, the conditions which must be satisfied are more restrictive (Bengio, 2000).
Finally, the general goal of statistical approach (or statistical estimation theory) is to estimate some unknown parameter from the observation of a set of random variables, referred to as “the data”. The maximum likelihood (ML), maximum a posteriori (MAP), expectation maximization (EM) or hidden Markov models (HMM) are examples of estimation techniques. Briefly, the ML is a classical technique which estimates a parameter directly from the data, assuming that data are distributed according to some parameterized probability distribution functions (Jeffreys, 1983). The MAP technique is a Bayesian approach, i.e., a priori available statistical information on the unknown parameters is also exploited for their estimation (DeGroot, 1970). When it is assumed that the data are drawn from a mixture of parameterized probability distribution functions, where not just the parameters, but also the mixture components have to be estimated, the EM algorithm is useful (Dempster et al., 1977, Redner and Walker, 1984, Jordan and Jacobs, 1994). As a last step, the HMM assume that the mixtures evolve in time according to a Markov process (Rabiner, 1989).
In this paper we suggest the use of a meta-model that generates recommendations about the best parameter values for the model of interest M and that differs from other systems in that it is:
- •
general, in the sense of being a system that can be applied to any model ;
- •
automatic, in the sense of being a system built from a set of data, without any model user supervision.
Most general, meta-modeling is the analysis, construction and development of the frames, rules, constraint, models and theories applicable and useful for the modeling in a predefined class of problems. In the context of this work, a model can be viewed as an abstraction of phenomena in the real world, and a meta-model is yet another abstraction, highlighting properties of the model itself.
The rest of the work is focused on defining such a system for parameter setting. As will be shown in the followings sections, Bayesian networks (BNs) are identified as the suitable formalism to define the meta-model . Moreover, the proposed model has the added benefit of removing the need for, or at least reducing the effect of, user decisions about parameter values.
The structure of the paper is as follows. Section 2 is devoted to describing the proposed model for parameter setting. In Section 3, notions about BNs are briefly presented, including learning from databases, and we outline their suitability for building a framework for parameter setting. After this, Section 4 describes the problem used throughout the paper to illustrate the application of the proposed adjustment model, the experiments carried out for evaluating it and the results attained. Finally, in Section 5 some conclusions and future works are explained.
Section snippets
Adjustment model
Let be a model used to solve a problem P and where is the set of model parameters or characteristics that the user must set up. We are going to distinguish two types of model parameters: external and internal, the first type being parameters that the user must fix to execute the model, whereas the second type are parameters established and updated in the model learning process. For example, the weights of an artificial neural network are parameters established in the training phase and,
Bayesian networks
BNs are one of the most important frameworks for representing and reasoning with probabilistic models and they have been applied to many real-world problems (Castillo et al., 1997, Jensen, 2001, Pearl, 1998).
Throughout the paper, any domain variable will be denoted by an upper-case letter (e.g., ) and its values or configurations will be represented by the same letter in lower case (e.g., ). Thus, the expression , in short , must be understood as the probability
Evaluation of the adjustment model
As a very suitable test bed for measuring the quality of the proposed model, the field of geometric problems was selected. Specifically, this section is focused on illustrating the usefulness of this adjustment model for achieving the right parameter values in GAs that operate as a selector mechanism in constructive geometric constraint solvers. The same study case was broached by Barreiro et al. (2004) and his results are used to evaluate how good this adjustment model is.
Conclusions and future work
In this paper a meta-model for parameter setting based on BN has been presented. The adjustment model has been designed with the aim of reducing the effort of the user who needs to optimize the parameters of the model which solves the problem P. The main novelty of this proposed model resides in the fact that it is general, i.e., domain independent, and automatic, that is to say, no user supervision is necessary. Another advantage is that the proposed model is easy to implement.
It was
Acknowledgments
Victoria Luzón has been partially supported by the Ministerio de Educación y Ciencia and by FEDER under Grant TIN2004-06326-C03-01. The authors are grateful to Barreiro for providing us with the necessary data for this work.
References (49)
- et al.
Sketch-based pruning of a solution space within a formal geometric constraint solver
Artificial Intelligence Journal
(2000) - et al.
Case-based reasoning: foundational issues, methodological variations and system approaches
AI Communications
(1994) A new look at statistical model identification
IEEE Transactions on Automatic Control
(1974)- et al.
Handbook of Evolutionary Computation
(1997) - Baker, J., 1987. Reducing bias and inefficiency in the selection algorithm. in: Proceedings of the 2nd International...
- Barreiro, E., Joan-Arinyo, R., Luzón, M., 2004. Algoritmos genéticos en el problema de la solución deseada....
Gradient-based optimization of hyper-parameters
Neural Computation
(2000)- et al.
Bayesian Theory
(1994) - et al.
Evolution strategies: a comprehensive introduction
Natural Computing
(2002) - et al.
Adaptive probabilistic networks with hidden variables
Machine Learning
(1997)
Operations for learning with graphical models
Journal of Artificial Intelligence Research
A Bayesian method for the induction of probabilistic networks from data
Machine Learning
Probabilistic Networks and Expert Systems
Smoothing noisy data with spline functions
Numerical Mathematics
Optimal Statistical Decisions
Maximum likelihood from incomplete data via the EM algorithm
Journal of the Royal Statistical Society
Pattern Classification
Parameter control in evolutionary algorithms
IEEE Transactions on Evolutionary Computation
Practical Methods of Optimization
Artificial Intelligence through Simulated Evolution. Forty Years of Evolutionary Programming
Cited by (11)
Recommendations for the tuning of rare event probability estimators
2015, Reliability Engineering and System SafetyCitation Excerpt :In [16,17], such approaches have been exploited via a discretization of the simulation parameter space. Bayesian networks have also been advocated in [18] by considering previous simulation runs as prior knowledge. Various other techniques have been employed for tuning purpose, such as Monte Carlo simulations [19], neural networks [20] or evolutionary algorithms [21,22].
Automatic parameter tuning for Evolutionary Algorithms using a Bayesian Case-Based Reasoning system
2014, Applied Soft Computing JournalCitation Excerpt :Algorithm configuration is commonly (either implicitly or explicitly) treated as an optimization problem, where the objective function captures performance on a fixed set of benchmark instances. In [12,13] a Bayesian CBR system was introduced as a general framework for solving the parameter tuning problem, conditioned by the current problem instance to be solved. The system estimates the best parameter configuration for maximizing the algorithm performance when it solves an instance-specific domain problem.
Optimisation of interacting particle systems for rare event estimation
2013, Computational Statistics and Data AnalysisLow-cost model selection for SVMs using local features
2012, Engineering Applications of Artificial IntelligenceCitation Excerpt :In this work we will strive to reduce computational cost while keeping state-of-the-art accuracy. Although many authors have explored the use of different parameter setting techniques in machine learning algorithms (Shaheen et al., 2010; Pavón et al., 2008; Bengio, 2000) and, in particular, in SVMs (Friedrichs and Igel, 2004; Samanta et al., 2003), most of these methods are based on high computational cost techniques, such as genetic algorithms (Sardiñas et al., 2006; Samanta et al., 2003). This fact has prevented these techniques from becoming mainstream, and most practitioners still prefer to cross-validate their algorithms' parameters, see, for instance Kohavi and John (1995).
Experimental evaluation of an automatic parameter setting system
2010, Expert Systems with ApplicationsCitation Excerpt :Independent of the approach, it is clear that a single configuration for A is not able to work well with the set of instances D. However, few approaches have been developed that take into account the relevant features of the problem instance at hand. In Pavón et al. (2008), Pavón et al. (2009) and Pavón (2008) a general framework for solving the algorithm configuration problem, conditioned by the current problem instance to be solved, is introduced. The approach is based on estimating the best parameter configuration that maximizes the algorithm performance when it solves an instance-specific domain problem.
Automatic parameter tuning with a Bayesian case-based reasoning system. A case of study
2009, Expert Systems with Applications