This chapter will briefly describe some common methods by which people make quantitative estimates of how well they expect empirical models to make predictions. However, the chapter’s main argument is that fit-to-data, the traditional yardstick for establishing confidence in models, is not quite the solid ground on which to build such belief some people think it is, especially for the kind of system agent-based modelling is usually applied to. Further, the chapter will show that the amount of data required to establish confidence in an arbitrary model by fit-to-data is often infeasible, unless there is some appropriate ‘big data’ available. This arbitrariness can be reduced by constraining the choice of model. In agent-based models, these constraints are introduced by their descriptiveness rather than by removing variables from consideration or making assumptions for the sake of simplicity. By comparing with neural networks, we show that agent-based models have a richer ontological structure. For agent-based models, in particular, this richness means that the ontological structure has a greater significance and yet is all too commonly taken for granted or assumed to be ‘common sense’. The chapter therefore also discusses some approaches to validating ontologies.
The case for agent-based modelling being that it is necessary to represent all the agents if you want to understand the emergent system-level dynamics.
Part of this is the confusion between ‘free parameters’, which can be adjusted to make the results fit data, and parameters with values that are, at least in theory, empirically observable, even if currently unknown. Agent-based models have a lot of the latter but relatively few of the former.
Macbeth, Act V, Scene V.
http://www.earth-system-dynamics.net/ <Accessed May 2017>.
https://www.commod.org/en <Accessed May 2017>.
http://www.r-project.org/ <Accessed May 2017>.
https://www.wikipedia.org/ <Accessed May 2017>.
We acknowledge funding from the Engineering and Physical Sciences Research Council (award no. 91310127), the European Commission Framework Programme 7 ‘GLAMURS’ project (grant agreement no. 613420) and the Scottish Government Rural Affairs, Food and the Environment Strategic Research Programme, Theme 2: Productive and Sustainable Land Management and Rural Economies. We are also grateful to Bruce Edmonds and Mark Brewer for useful comments on earlier drafts of this chapter; any mistakes are of course our own.
Further Reading
Shalizi’s (2006) book chapter covers approaches to modelling (and measuring) complex systems in a more formal and comprehensive way, with a focus on more traditional mathematical modelling techniques. However, he also covers issues with validation and penalization of parameters, including discussions of VC theory and Ockham’s razor.
Sowa’s (1999) book on knowledge representation is a good introduction to various issues in the field and covers various formalisms and underlying philosophical questions that the formal representation of knowledge yields. Baader et al.’s (2003) Description Logic Handbook goes in to more details on description logics. Another book, which goes into some depth on controversies in the formal representation of what otherwise seems to be a simple everyday concept, ‘if-then’, is Evans and Over’s (2004) book, and this too is highly recommended.
Since one of the ways of validating ontologies is through engaging with stakeholders, the Companion Modelling school of agent-based modelling, pioneered especially by research teams based in France, is well worth familiarizing yourself with. They have a websiteFootnote 6 and a book (Etienne 2014) as well as several publications illustrating their work. Since they sometimes use ontologies as part of their methodological approach to modelling with stakeholders, the work of authors such as Jean-Pierre Müller, Nicolas Becu and Pascal Perez and their collaborators are particularly worth investigating. Some example articles include Müller (2010), Becu et al. (2003) and Perez et al. (2009). Companion modellers are not the only ones to apply knowledge elicitation to model design, however – see, for example, Bharwani et al. (2015).
Validation has long been a subject of discussion in agent-based modelling, and this chapter has not dedicated space to reviewing the excellent thinking that has already been done on the topic. The interested reader wanting to access some of this literature is advised to look for keywords such as validation, calibration and verification in the Journal of Artificial Societies and Social Simulation, currently the principal journal for publication of agent-based social simulation work. Notable recent articles include Schulze et al. (2017), Drchal et al. (2016), ten Broeke et al. (2016) and Lovelace et al. (2015). Other older articles worth a read are Elsenbroich (2012), Radax and Rengs (2010) and Rossiter et al. (2010). See also some of the debates such as Thompson and Derr’s (2009) critique of Epstein’s (2008) article and Troitzsch’s (2009) response and Moss’s (2008) reflections on Windrum et al.’s (2007) paper. A practical article on one approach to validating agent-based models outwith JASSS is Moss and Edmonds (2005).
Appendix 1: Neural Networks
Though there are variants, typically the excitation, x j , of a node j is given by the weighted sum of its inputs (8.2):
where o i (usually in the range [0, 1], though some formalisms use [−1, 1]) is the output of a node i with a connection that inputs to node j and w ij is the strength (weight) of that connection.
Nonlinearity of the behaviour of the node is critical to the power that the neural network has as an information processing system. It is introduced by making the output o j of a node a nonlinear function of its excitation x j . There are a number of ways this can be achieved. Since many learning algorithms rely on the differentiability of the output with respect to the weights, the sigmoid function is typically used:
So, a neural network essentially consists of a directed graph of nodes, where each of the links has a weight. If the graph is acyclic, the neural network is known as a feed-forward network. (If cyclic, the network is recurrent.) Nodes with no input connections are input nodes; those with no output connections are output nodes. Since they have no input connections and hence no excitation, input nodes are often also not given a nonlinear treatment as per (8.3), though this breaks somewhat with the simulation of a neuron. Similarly, nonlinearity may not be applied to output nodes. If there are N input nodes, and M output nodes, then essentially a feed-forward network without nonlinearity on the output nodes is computing a mapping from R N to R M. With nonlinearity, the mapping is from R N to [0, 1]M.
Appendix 2: Metrics of and Methods for Validation
Table 8.3 explains various metrics and measures of validation, showing you where to find out more information on them and how to use them with R. For those of you unfamiliar with R, it is a popularly usedFootnote 7 free (as in open-source and in the financial sense) statistical software package, available for Windows, OS-X and Linux.Footnote 8 Each of the examples assumes you are validating against a single variable (unless otherwise stated) for which you have a number of samples from your data and corresponding output from your model. The R variable vdata contains the empirical data to validate against (which must not have been used for calibration – though many of the metrics can of course be applied to the calibration process), whilst the variable model contains the corresponding output from the model. The two variables vdata and model are, in R terms, vectors of equal length. If the model predicted the data perfectly, then for each element i of the two vectors, vdata[i] == model[i]. More information on each of the approaches can be found on Wikipedia,Footnote 9 R documentation and in various machine learning and advanced statistical textbooks.
Appendix 3: Expressivity of Various Modelling Approaches
Description logics use a letter-based notation to describe the axioms each logic has (Baader and Nutt 2003; Calvanese and De Giacomo 2003; Baader et al. 2003). Briefly, \( \mathcal{AL} \) is a basic description logic, and (𝒟) is for data properties; \( \mathcal{C} \) provides more complex class axioms than the basic axioms in \( \mathcal{AL} \); ℛ is for complex relationship assertions such as irreflexivity (all NetLogo links are irreflexive, e.g. as you cannot link anything to itself); \( \mathcal{O} \) introduces nominals (a bit-like enumerations in Java); ℐ inverse relationships; \( \mathcal{N} \) numerical restrictions on properties; and ℱ functional properties. Table 8.4 provides an initial indication of the description logic expressivity needed to capture the syntax used to specify the ontologies of various modelling approaches. However, the labels applied in the ‘description logic’ column do not necessarily mean that the full capabilities of the language are necessarily used.
Polhill, G., Salt, D. (2017). The Importance of Ontological Structure: Why Validation by 'Fit-to-Data' Is Insufficient. In: Edmonds, B., Meyer, R. (eds) Simulating Social Complexity. Understanding Complex Systems. Springer, Cham.
