A model for parameter setting based on Bayesian networks

doi:10.1016/j.engappai.2007.02.013

Engineering Applications of Artificial Intelligence

Volume 21, Issue 1, February 2008, Pages 14-25

https://doi.org/10.1016/j.engappai.2007.02.013 Get rights and content

Abstract

One of the difficulties that the user faces when using a model to solve a problem is that, before running the model, a set of parameter values have to be specified. Deciding on an appropriate set of parameter values is not an easy task. Over the years, several standard optimization methods, as well as various alternative approaches according to the problem at hand, have been proposed for parameter setting. These techniques have their merits and demerits, but usually they have a fairly restricted application range, including a lack of generality or the need of user supervision. This paper proposes a meta-model that generates the recommendations about the best parameter values for the model of interest. Its main characteristic is that it is an automatic meta-model that can be applied to any model. For evaluation purposes and in order to be able to compare our results with results obtained by others, a real geometric problem was selected. The experiments show the validity of the proposed adjustment model.

Introduction

There are a variety of situations in which a researcher is faced with a modeling problem. Modeling is a process through which a model M (function or algorithm) is constructed to explain the behavior of the system and to predict unknown answers. Once the model of the studied system is established based on a set of parameters $Θ$ , the parameter values which make the model generate the best results should settle down.

In this work, we are interested in the parameter setting of models used in the artificial intelligence (AI) field, such as artificial neural networks, genetic algorithms (GAS), cluster algorithms and so on (Duda et al., 2001). Parameter setting of a model $M (Θ)$ can be carried out either by applying specific mechanisms to the model (Friedrichs and Igel, 2004) or by applying general optimization techniques used in system modeling (Fletcher, 2000, Rao, 1996). In many other cases, the choice of the parameters involves consulting the specialized technical literature or simply resorting to a trial and error technique.

This work has utility in the engineering field, where a variety of AI models and techniques are used to solve a whole range of problems, but such techniques are not known in depth. Our work could help in these situations. As an example to illustrate its usefulness, a problem of graphic design was selected (the root identification problem).

In a deeper analysis, it can be distinguished three approaches to the problem of parameter setting in AI: the evolutionary approach (used by evolutionary algorithms: GAS (Goldberg, 1989); evolution strategies (Schwefel, 1995); and evolutionary programming (Fogel, 1999)), the model selection approach and the statistical approach.

The evolutionary approach is based on adapting the parameters during the evolutionary algorithm run. Parameter control techniques can be sub-divided into three types: deterministic, adaptive, and self-adaptive (Eiben et al., 1999). In deterministic control, the parameters are changed according to deterministic rules without using any feedback from the search. The adaptive control takes place when there is some form of feedback that influences the parameter specification. Examples of adaptive control are the works of Davis (1989), Julstrom (1995) and Smith and Smuda (1995). Finally, self-adaptive control is based on the idea that evolution can be also applied in the search for good parameter values. In this type of control, the operator probabilities are encoded together with the corresponding solution, and undergo recombination and mutation. Self-adaptive evolution strategies (Beyer and Schwefel, 2002) are an example of the application of this type of parameter control.

In the model selection approach, a model selection criterion can be used to select parameters, more generally to compare and choose among models which may have a different capacity or competence. When there is only a single parameter one can easily explore how its value affects the model selection criterion: typically one tries a finite number of values of the parameter and picks the one which gives the lowest value of the model selection criterion. Most model selection criteria have been proposed for selecting a single parameter that controls the “complexity” of the class of functions in which the learning algorithms find a solution, e.g., the structural risk minimization (Vapnik, 1982), the Akaike (1974) Information Criterion, or the generalized cross-validation criterion (Craven and Wahba, 1979). Another type of criteria are those based on held-out data, such as the cross-validation estimates of generalization error (Kohavi and John, 1995). These are almost unbiased estimates of generalization error (Vapnik, 1982) obtained by testing $M (Θ)$ on data not used to choose parameter $Θ$ . This approach is less applicable when there are several parameters to simultaneously optimize. In this case, the conditions which must be satisfied are more restrictive (Bengio, 2000).

Finally, the general goal of statistical approach (or statistical estimation theory) is to estimate some unknown parameter from the observation of a set of random variables, referred to as “the data”. The maximum likelihood (ML), maximum a posteriori (MAP), expectation maximization (EM) or hidden Markov models (HMM) are examples of estimation techniques. Briefly, the ML is a classical technique which estimates a parameter directly from the data, assuming that data are distributed according to some parameterized probability distribution functions (Jeffreys, 1983). The MAP technique is a Bayesian approach, i.e., a priori available statistical information on the unknown parameters is also exploited for their estimation (DeGroot, 1970). When it is assumed that the data are drawn from a mixture of parameterized probability distribution functions, where not just the parameters, but also the mixture components have to be estimated, the EM algorithm is useful (Dempster et al., 1977, Redner and Walker, 1984, Jordan and Jacobs, 1994). As a last step, the HMM assume that the mixtures evolve in time according to a Markov process (Rabiner, 1989).

In this paper we suggest the use of a meta-model $M^{*}$ that generates recommendations about the best parameter values $Θ$ for the model of interest M and that differs from other systems in that it is:

•
general, in the sense of being a system that can be applied to any model $M (Θ)$ ;
•
automatic, in the sense of being a system built from a set of data, without any model user supervision.

Most general, meta-modeling is the analysis, construction and development of the frames, rules, constraint, models and theories applicable and useful for the modeling in a predefined class of problems. In the context of this work, a model can be viewed as an abstraction of phenomena in the real world, and a meta-model is yet another abstraction, highlighting properties of the model itself.

The rest of the work is focused on defining such a system for parameter setting. As will be shown in the followings sections, Bayesian networks (BNs) are identified as the suitable formalism to define the meta-model $M^{*}$ . Moreover, the proposed model has the added benefit of removing the need for, or at least reducing the effect of, user decisions about parameter values.

The structure of the paper is as follows. Section 2 is devoted to describing the proposed model for parameter setting. In Section 3, notions about BNs are briefly presented, including learning from databases, and we outline their suitability for building a framework for parameter setting. After this, Section 4 describes the problem used throughout the paper to illustrate the application of the proposed adjustment model, the experiments carried out for evaluating it and the results attained. Finally, in Section 5 some conclusions and future works are explained.

Section snippets

Adjustment model

Let $M (Θ)$ be a model used to solve a problem P and where $Θ$ is the set of model parameters or characteristics that the user must set up. We are going to distinguish two types of model parameters: external and internal, the first type being parameters that the user must fix to execute the model, whereas the second type are parameters established and updated in the model learning process. For example, the weights of an artificial neural network are parameters established in the training phase and,

Bayesian networks

BNs are one of the most important frameworks for representing and reasoning with probabilistic models and they have been applied to many real-world problems (Castillo et al., 1997, Jensen, 2001, Pearl, 1998).

Throughout the paper, any domain variable will be denoted by an upper-case letter (e.g., $X, Y, X_{i}, Θ$ ) and its values or configurations will be represented by the same letter in lower case (e.g., $x, y, x_{i}, θ$ ). Thus, the expression $P (X_{i} = x_{i})$ , in short $P (x_{i})$ , must be understood as the probability

Evaluation of the adjustment model

As a very suitable test bed for measuring the quality of the proposed model, the field of geometric problems was selected. Specifically, this section is focused on illustrating the usefulness of this adjustment model for achieving the right parameter values in GAs that operate as a selector mechanism in constructive geometric constraint solvers. The same study case was broached by Barreiro et al. (2004) and his results are used to evaluate how good this adjustment model is.

Conclusions and future work

In this paper a meta-model for parameter setting based on BN has been presented. The adjustment model has been designed with the aim of reducing the effort of the user who needs to optimize the parameters $Θ$ of the model $M (Θ)$ which solves the problem P. The main novelty of this proposed model resides in the fact that it is general, i.e., domain independent, and automatic, that is to say, no user supervision is necessary. Another advantage is that the proposed model is easy to implement.

It was

Acknowledgments

Victoria Luzón has been partially supported by the Ministerio de Educación y Ciencia and by FEDER under Grant TIN2004-06326-C03-01. The authors are grateful to Barreiro for providing us with the necessary data for this work.

References (49)

C. Essert-Villard et al.
Sketch-based pruning of a solution space within a formal geometric constraint solver
Artificial Intelligence Journal
(2000)
A. Aamodt et al.
Case-based reasoning: foundational issues, methodological variations and system approaches
AI Communications
(1994)
H. Akaike
A new look at statistical model identification
IEEE Transactions on Automatic Control
(1974)
T. Bäck et al.
Handbook of Evolutionary Computation
(1997)
Baker, J., 1987. Reducing bias and inefficiency in the selection algorithm. in: Proceedings of the 2nd International...
Barreiro, E., Joan-Arinyo, R., Luzón, M., 2004. Algoritmos genéticos en el problema de la solución deseada....
Y. Bengio
Gradient-based optimization of hyper-parameters
Neural Computation
(2000)
J. Bernardo et al.
Bayesian Theory
(1994)
H. Beyer et al.
Evolution strategies: a comprehensive introduction
Natural Computing
(2002)
J. Binder et al.
Adaptive probabilistic networks with hidden variables
Machine Learning
(1997)

Buntine, W., 1991. Theory refinement on Bayesian networks. In: Proceedings of the 7th Conference on Uncertainty in...

W. Buntine

Operations for learning with graphical models

Journal of Artificial Intelligence Research

(1994)

Castillo, E., Gutierrez, J., Hadi, A., 1997. Expert Systems and Probabilistic Network Models, first ed. Springer,...

G. Cooper et al.

A Bayesian method for the induction of probabilistic networks from data

Machine Learning

(1992)

R. Cowell et al.

Probabilistic Networks and Expert Systems

(1999)

P. Craven et al.

Smoothing noisy data with spline functions

Numerical Mathematics

(1979)

Davis, L., 1989. Adapting operator probabilities in genetic algorithms, In: Schaffer, J.D. (Ed.), Proceedings of the...

M. DeGroot

Optimal Statistical Decisions

(1970)

A. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm

Journal of the Royal Statistical Society

(1977)

R. Duda et al.

Pattern Classification

(2001)

A. Eiben et al.

Parameter control in evolutionary algorithms

IEEE Transactions on Evolutionary Computation

(1999)

R. Fletcher

Practical Methods of Optimization

(2000)

L. Fogel

Artificial Intelligence through Simulated Evolution. Forty Years of Evolutionary Programming

(1999)

Friedrichs, F., Igel, C., 2004. Evolutionary tuning of multiple svm parameters. In: Proceedings of the 12th European...

Cited by (11)

Recommendations for the tuning of rare event probability estimators
2015, Reliability Engineering and System Safety
Citation Excerpt :
In [16,17], such approaches have been exploited via a discretization of the simulation parameter space. Bayesian networks have also been advocated in [18] by considering previous simulation runs as prior knowledge. Various other techniques have been employed for tuning purpose, such as Monte Carlo simulations [19], neural networks [20] or evolutionary algorithms [21,22].
Being able to accurately estimate rare event probabilities is a challenging issue in order to improve the reliability of complex systems. Several powerful methods such as importance sampling, importance splitting or extreme value theory have been proposed in order to reduce the computational cost and to improve the accuracy of extreme probability estimation. However, the performance of these methods is highly correlated with the choice of tuning parameters, which are very difficult to determine. In order to highlight recommended tunings for such methods, an empirical campaign of automatic tuning on a set of representative test cases is conducted for splitting methods. It allows to provide a reduced set of tuning parameters that may lead to the reliable estimation of rare event probability for various problems. The relevance of the obtained result is assessed on a series of real-world aerospace problems.
Automatic parameter tuning for Evolutionary Algorithms using a Bayesian Case-Based Reasoning system
2014, Applied Soft Computing Journal
Citation Excerpt :
Algorithm configuration is commonly (either implicitly or explicitly) treated as an optimization problem, where the objective function captures performance on a fixed set of benchmark instances. In [12,13] a Bayesian CBR system was introduced as a general framework for solving the parameter tuning problem, conditioned by the current problem instance to be solved. The system estimates the best parameter configuration for maximizing the algorithm performance when it solves an instance-specific domain problem.
The widespread use and applicability of Evolutionary Algorithms is due in part to the ability to adapt them to a particular problem-solving context by tuning their parameters. This is one of the problems that a user faces when applying an Evolutionary Algorithm to solve a given problem. Before running the algorithm, the user typically has to specify values for a number of parameters, such as population size, selection rate, and probability operators.
This paper empirically assesses the performance of an automatic parameter tuning system in order to avoid the problems of time requirements and the interaction of parameters. The system, based on Bayesian Networks and Case-Based Reasoning methodology, estimates the best parameter setting for maximizing the performance of Evolutionary Algorithms. The algorithms are applied to solve a basic problem in constraint-based, geometric parametric modeling, as an instance of general constraint-satisfaction problems.
The experimental results demonstrate the validity of the proposed system and its potential effectiveness for configuring algorithms.
Optimisation of interacting particle systems for rare event estimation
2013, Computational Statistics and Data Analysis
The interacting particle system (IPS) is a recent probabilistic model proposed to estimate rare event probabilities for Markov chains. The principle of IPS is to apply alternatively selection and mutation stages to a set of initial particles in order to estimate probabilities or quantiles more accurately than with usual estimation techniques. The practical issue of IPS is the tuning of a parameter in the selection stage. Kriging-based optimisation strategy with a low simulation cost is thus proposed in order to minimise the probability estimate relative error. The efficiency of the proposed strategy is demonstrated on different test cases.
Low-cost model selection for SVMs using local features
2012, Engineering Applications of Artificial Intelligence
Citation Excerpt :
In this work we will strive to reduce computational cost while keeping state-of-the-art accuracy. Although many authors have explored the use of different parameter setting techniques in machine learning algorithms (Shaheen et al., 2010; Pavón et al., 2008; Bengio, 2000) and, in particular, in SVMs (Friedrichs and Igel, 2004; Samanta et al., 2003), most of these methods are based on high computational cost techniques, such as genetic algorithms (Sardiñas et al., 2006; Samanta et al., 2003). This fact has prevented these techniques from becoming mainstream, and most practitioners still prefer to cross-validate their algorithms' parameters, see, for instance Kohavi and John (1995).
Many practical engineering applications require the usage of accurate automatic decision systems, usually operating under tight computational constraints. Support Vector Machines (SVMs) endowed with a Radial Basis Function (RBF) as kernel are broadly accepted as the current state of the art for decision problems, but require cross-validation to select the free parameters, which is computationally costly. In this work we investigate low-cost methods to select the spread parameter in SVMs with an RBF kernel. Our proposal relies on the use of simple local methods that gather information about the local structure of each dataset. Empirical results in UCI datasets show that the proposed methods can be used as a fast alternative to the standard cross-validation procedure, with the additional advantage of avoiding the (often heuristic) task of a priori fixing the values of the spread parameter to be explored.
Experimental evaluation of an automatic parameter setting system
2010, Expert Systems with Applications
Citation Excerpt :
Independent of the approach, it is clear that a single configuration for A is not able to work well with the set of instances D. However, few approaches have been developed that take into account the relevant features of the problem instance at hand. In Pavón et al. (2008), Pavón et al. (2009) and Pavón (2008) a general framework for solving the algorithm configuration problem, conditioned by the current problem instance to be solved, is introduced. The approach is based on estimating the best parameter configuration that maximizes the algorithm performance when it solves an instance-specific domain problem.
Finding the parameter setting that will result in the optimal performance of a given algorithm for solving a problem is a tedious task. This paper briefly describes a system that automatically chooses the best algorithm parameter configuration conditioned by the current problem instance to solve. The system uses bayesian networks (BN) and case-based reasoning (CBR) methodology to find such a configuration. CBR provides a mechanism to acquire knowledge about the specific problem domain. BN provide a tool to model quantitative and qualitative relationships between parameters of interest.
However, the aim of this work is to empirically evaluate the system described, using as an example the configuration of a genetic algorithm that solves the root identification problem. In this context, we report on several statistically guided experimental evaluations. The experimental results confirm the validity of the proposed system and its potential effectiveness for configuring algorithms.
Automatic parameter tuning with a Bayesian case-based reasoning system. A case of study
2009, Expert Systems with Applications
The parameter setting of an algorithm that will result in optimal performance differs across problem instance domains. Users spend a lot of time tuning algorithms for their specific problem domain, and this time could be saved by an automatic approach for parameter tuning.
In this paper, we present a system that recommends the parameter configuration of an algorithm that solves a problem, conditioned by the particular features of the current problem instance to be solved. The proposed system is based on a basic adjustment model designed by authors (Pavon, R., Díaz, F., & Luzón, V. (2008). A model for parameter setting based on Bayesian networks. Engineering Applications of Artificial Intelligence, 21(1), 14–25) in which starting from experimental results concerning the search for solutions to several instances of the problem, a Bayesian network (BN) is induced and tries to infer the best configuration for the algorithm used.
However, taking into account that the optimal parameter configuration may differ considerably across problem instances of a specific domain, the present work extends the former incorporating additional information about problem instances and using the case-based reasoning (CBR) methodology as the framework integrator for the different instances from the same problem, where each problem instance deals with a specific BN. In this way, the system will automatically recommend a parameter configuration for a given algorithm according to the characteristics of the problem instance at hand and past experience of similar instances.
As an example, we empirically evaluate our Bayesian CBR system to tune a genetic algorithm for solving the root identification problem. The experimental results demonstrate the validity of the model proposed.

View all citing articles on Scopus

View full text

A model for parameter setting based on Bayesian networks

Abstract

Introduction

Section snippets

Adjustment model

Bayesian networks

Evaluation of the adjustment model

Conclusions and future work

Acknowledgments

Artificial Intelligence Journal

Case-based reasoning: foundational issues, methodological variations and system approaches

AI Communications

A new look at statistical model identification

IEEE Transactions on Automatic Control

Handbook of Evolutionary Computation

Gradient-based optimization of hyper-parameters

Neural Computation

Bayesian Theory

Evolution strategies: a comprehensive introduction

Natural Computing

Adaptive probabilistic networks with hidden variables

Machine Learning

Operations for learning with graphical models

Journal of Artificial Intelligence Research

A Bayesian method for the induction of probabilistic networks from data

Machine Learning

Probabilistic Networks and Expert Systems

Smoothing noisy data with spline functions

Numerical Mathematics

Optimal Statistical Decisions

Maximum likelihood from incomplete data via the EM algorithm

Journal of the Royal Statistical Society

Pattern Classification

Parameter control in evolutionary algorithms

IEEE Transactions on Evolutionary Computation

Practical Methods of Optimization

Artificial Intelligence through Simulated Evolution. Forty Years of Evolutionary Programming