A model for parameter setting based on Bayesian networks

https://doi.org/10.1016/j.engappai.2007.02.013Get rights and content

Abstract

One of the difficulties that the user faces when using a model to solve a problem is that, before running the model, a set of parameter values have to be specified. Deciding on an appropriate set of parameter values is not an easy task. Over the years, several standard optimization methods, as well as various alternative approaches according to the problem at hand, have been proposed for parameter setting. These techniques have their merits and demerits, but usually they have a fairly restricted application range, including a lack of generality or the need of user supervision. This paper proposes a meta-model that generates the recommendations about the best parameter values for the model of interest. Its main characteristic is that it is an automatic meta-model that can be applied to any model. For evaluation purposes and in order to be able to compare our results with results obtained by others, a real geometric problem was selected. The experiments show the validity of the proposed adjustment model.

Introduction

There are a variety of situations in which a researcher is faced with a modeling problem. Modeling is a process through which a model M (function or algorithm) is constructed to explain the behavior of the system and to predict unknown answers. Once the model of the studied system is established based on a set of parameters Θ, the parameter values which make the model generate the best results should settle down.

In this work, we are interested in the parameter setting of models used in the artificial intelligence (AI) field, such as artificial neural networks, genetic algorithms (GAS), cluster algorithms and so on (Duda et al., 2001). Parameter setting of a model M(Θ) can be carried out either by applying specific mechanisms to the model (Friedrichs and Igel, 2004) or by applying general optimization techniques used in system modeling (Fletcher, 2000, Rao, 1996). In many other cases, the choice of the parameters involves consulting the specialized technical literature or simply resorting to a trial and error technique.

This work has utility in the engineering field, where a variety of AI models and techniques are used to solve a whole range of problems, but such techniques are not known in depth. Our work could help in these situations. As an example to illustrate its usefulness, a problem of graphic design was selected (the root identification problem).

In a deeper analysis, it can be distinguished three approaches to the problem of parameter setting in AI: the evolutionary approach (used by evolutionary algorithms: GAS (Goldberg, 1989); evolution strategies (Schwefel, 1995); and evolutionary programming (Fogel, 1999)), the model selection approach and the statistical approach.

The evolutionary approach is based on adapting the parameters during the evolutionary algorithm run. Parameter control techniques can be sub-divided into three types: deterministic, adaptive, and self-adaptive (Eiben et al., 1999). In deterministic control, the parameters are changed according to deterministic rules without using any feedback from the search. The adaptive control takes place when there is some form of feedback that influences the parameter specification. Examples of adaptive control are the works of Davis (1989), Julstrom (1995) and Smith and Smuda (1995). Finally, self-adaptive control is based on the idea that evolution can be also applied in the search for good parameter values. In this type of control, the operator probabilities are encoded together with the corresponding solution, and undergo recombination and mutation. Self-adaptive evolution strategies (Beyer and Schwefel, 2002) are an example of the application of this type of parameter control.

In the model selection approach, a model selection criterion can be used to select parameters, more generally to compare and choose among models which may have a different capacity or competence. When there is only a single parameter one can easily explore how its value affects the model selection criterion: typically one tries a finite number of values of the parameter and picks the one which gives the lowest value of the model selection criterion. Most model selection criteria have been proposed for selecting a single parameter that controls the “complexity” of the class of functions in which the learning algorithms find a solution, e.g., the structural risk minimization (Vapnik, 1982), the Akaike (1974) Information Criterion, or the generalized cross-validation criterion (Craven and Wahba, 1979). Another type of criteria are those based on held-out data, such as the cross-validation estimates of generalization error (Kohavi and John, 1995). These are almost unbiased estimates of generalization error (Vapnik, 1982) obtained by testing M(Θ) on data not used to choose parameter Θ. This approach is less applicable when there are several parameters to simultaneously optimize. In this case, the conditions which must be satisfied are more restrictive (Bengio, 2000).

Finally, the general goal of statistical approach (or statistical estimation theory) is to estimate some unknown parameter from the observation of a set of random variables, referred to as “the data”. The maximum likelihood (ML), maximum a posteriori (MAP), expectation maximization (EM) or hidden Markov models (HMM) are examples of estimation techniques. Briefly, the ML is a classical technique which estimates a parameter directly from the data, assuming that data are distributed according to some parameterized probability distribution functions (Jeffreys, 1983). The MAP technique is a Bayesian approach, i.e., a priori available statistical information on the unknown parameters is also exploited for their estimation (DeGroot, 1970). When it is assumed that the data are drawn from a mixture of parameterized probability distribution functions, where not just the parameters, but also the mixture components have to be estimated, the EM algorithm is useful (Dempster et al., 1977, Redner and Walker, 1984, Jordan and Jacobs, 1994). As a last step, the HMM assume that the mixtures evolve in time according to a Markov process (Rabiner, 1989).

In this paper we suggest the use of a meta-model M* that generates recommendations about the best parameter values Θ for the model of interest M and that differs from other systems in that it is:

  • general, in the sense of being a system that can be applied to any model M(Θ);

  • automatic, in the sense of being a system built from a set of data, without any model user supervision.

Most general, meta-modeling is the analysis, construction and development of the frames, rules, constraint, models and theories applicable and useful for the modeling in a predefined class of problems. In the context of this work, a model can be viewed as an abstraction of phenomena in the real world, and a meta-model is yet another abstraction, highlighting properties of the model itself.

The rest of the work is focused on defining such a system for parameter setting. As will be shown in the followings sections, Bayesian networks (BNs) are identified as the suitable formalism to define the meta-model M*. Moreover, the proposed model has the added benefit of removing the need for, or at least reducing the effect of, user decisions about parameter values.

The structure of the paper is as follows. Section 2 is devoted to describing the proposed model for parameter setting. In Section 3, notions about BNs are briefly presented, including learning from databases, and we outline their suitability for building a framework for parameter setting. After this, Section 4 describes the problem used throughout the paper to illustrate the application of the proposed adjustment model, the experiments carried out for evaluating it and the results attained. Finally, in Section 5 some conclusions and future works are explained.

Section snippets

Adjustment model

Let M(Θ) be a model used to solve a problem P and where Θ is the set of model parameters or characteristics that the user must set up. We are going to distinguish two types of model parameters: external and internal, the first type being parameters that the user must fix to execute the model, whereas the second type are parameters established and updated in the model learning process. For example, the weights of an artificial neural network are parameters established in the training phase and,

Bayesian networks

BNs are one of the most important frameworks for representing and reasoning with probabilistic models and they have been applied to many real-world problems (Castillo et al., 1997, Jensen, 2001, Pearl, 1998).

Throughout the paper, any domain variable will be denoted by an upper-case letter (e.g., X,Y,Xi,Θ) and its values or configurations will be represented by the same letter in lower case (e.g., x,y,xi,θ). Thus, the expression P(Xi=xi), in short P(xi), must be understood as the probability

Evaluation of the adjustment model

As a very suitable test bed for measuring the quality of the proposed model, the field of geometric problems was selected. Specifically, this section is focused on illustrating the usefulness of this adjustment model for achieving the right parameter values in GAs that operate as a selector mechanism in constructive geometric constraint solvers. The same study case was broached by Barreiro et al. (2004) and his results are used to evaluate how good this adjustment model is.

Conclusions and future work

In this paper a meta-model for parameter setting based on BN has been presented. The adjustment model has been designed with the aim of reducing the effort of the user who needs to optimize the parameters Θ of the model M(Θ) which solves the problem P. The main novelty of this proposed model resides in the fact that it is general, i.e., domain independent, and automatic, that is to say, no user supervision is necessary. Another advantage is that the proposed model is easy to implement.

It was

Acknowledgments

Victoria Luzón has been partially supported by the Ministerio de Educación y Ciencia and by FEDER under Grant TIN2004-06326-C03-01. The authors are grateful to Barreiro for providing us with the necessary data for this work.

References (49)

  • C. Essert-Villard et al.

    Sketch-based pruning of a solution space within a formal geometric constraint solver

    Artificial Intelligence Journal

    (2000)
  • A. Aamodt et al.

    Case-based reasoning: foundational issues, methodological variations and system approaches

    AI Communications

    (1994)
  • H. Akaike

    A new look at statistical model identification

    IEEE Transactions on Automatic Control

    (1974)
  • T. Bäck et al.

    Handbook of Evolutionary Computation

    (1997)
  • Baker, J., 1987. Reducing bias and inefficiency in the selection algorithm. in: Proceedings of the 2nd International...
  • Barreiro, E., Joan-Arinyo, R., Luzón, M., 2004. Algoritmos genéticos en el problema de la solución deseada....
  • Y. Bengio

    Gradient-based optimization of hyper-parameters

    Neural Computation

    (2000)
  • J. Bernardo et al.

    Bayesian Theory

    (1994)
  • H. Beyer et al.

    Evolution strategies: a comprehensive introduction

    Natural Computing

    (2002)
  • J. Binder et al.

    Adaptive probabilistic networks with hidden variables

    Machine Learning

    (1997)
  • Buntine, W., 1991. Theory refinement on Bayesian networks. In: Proceedings of the 7th Conference on Uncertainty in...
  • W. Buntine

    Operations for learning with graphical models

    Journal of Artificial Intelligence Research

    (1994)
  • Castillo, E., Gutierrez, J., Hadi, A., 1997. Expert Systems and Probabilistic Network Models, first ed. Springer,...
  • G. Cooper et al.

    A Bayesian method for the induction of probabilistic networks from data

    Machine Learning

    (1992)
  • R. Cowell et al.

    Probabilistic Networks and Expert Systems

    (1999)
  • P. Craven et al.

    Smoothing noisy data with spline functions

    Numerical Mathematics

    (1979)
  • Davis, L., 1989. Adapting operator probabilities in genetic algorithms, In: Schaffer, J.D. (Ed.), Proceedings of the...
  • M. DeGroot

    Optimal Statistical Decisions

    (1970)
  • A. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    Journal of the Royal Statistical Society

    (1977)
  • R. Duda et al.

    Pattern Classification

    (2001)
  • A. Eiben et al.

    Parameter control in evolutionary algorithms

    IEEE Transactions on Evolutionary Computation

    (1999)
  • R. Fletcher

    Practical Methods of Optimization

    (2000)
  • L. Fogel

    Artificial Intelligence through Simulated Evolution. Forty Years of Evolutionary Programming

    (1999)
  • Friedrichs, F., Igel, C., 2004. Evolutionary tuning of multiple svm parameters. In: Proceedings of the 12th European...
  • Cited by (11)

    • Recommendations for the tuning of rare event probability estimators

      2015, Reliability Engineering and System Safety
      Citation Excerpt :

      In [16,17], such approaches have been exploited via a discretization of the simulation parameter space. Bayesian networks have also been advocated in [18] by considering previous simulation runs as prior knowledge. Various other techniques have been employed for tuning purpose, such as Monte Carlo simulations [19], neural networks [20] or evolutionary algorithms [21,22].

    • Automatic parameter tuning for Evolutionary Algorithms using a Bayesian Case-Based Reasoning system

      2014, Applied Soft Computing Journal
      Citation Excerpt :

      Algorithm configuration is commonly (either implicitly or explicitly) treated as an optimization problem, where the objective function captures performance on a fixed set of benchmark instances. In [12,13] a Bayesian CBR system was introduced as a general framework for solving the parameter tuning problem, conditioned by the current problem instance to be solved. The system estimates the best parameter configuration for maximizing the algorithm performance when it solves an instance-specific domain problem.

    • Optimisation of interacting particle systems for rare event estimation

      2013, Computational Statistics and Data Analysis
    • Low-cost model selection for SVMs using local features

      2012, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      In this work we will strive to reduce computational cost while keeping state-of-the-art accuracy. Although many authors have explored the use of different parameter setting techniques in machine learning algorithms (Shaheen et al., 2010; Pavón et al., 2008; Bengio, 2000) and, in particular, in SVMs (Friedrichs and Igel, 2004; Samanta et al., 2003), most of these methods are based on high computational cost techniques, such as genetic algorithms (Sardiñas et al., 2006; Samanta et al., 2003). This fact has prevented these techniques from becoming mainstream, and most practitioners still prefer to cross-validate their algorithms' parameters, see, for instance Kohavi and John (1995).

    • Experimental evaluation of an automatic parameter setting system

      2010, Expert Systems with Applications
      Citation Excerpt :

      Independent of the approach, it is clear that a single configuration for A is not able to work well with the set of instances D. However, few approaches have been developed that take into account the relevant features of the problem instance at hand. In Pavón et al. (2008), Pavón et al. (2009) and Pavón (2008) a general framework for solving the algorithm configuration problem, conditioned by the current problem instance to be solved, is introduced. The approach is based on estimating the best parameter configuration that maximizes the algorithm performance when it solves an instance-specific domain problem.

    View all citing articles on Scopus
    View full text