Skip to main content
Log in

Model change and reliability in scientific inference

  • Published:
Synthese Aims and scope Submit manuscript

An Erratum to this article was published on 06 May 2014

Abstract

One persistent challenge in scientific practice is that the structure of the world can be unstable: changes in the broader context can alter which model of a phenomenon is preferred, all without any overt signal. Scientific discovery becomes much harder when we have a moving target, and the resulting incorrect understandings of relationships in the world can have significant real-world and practical consequences. In this paper, we argue that it is common (in certain sciences) to have changes of context that lead to changes in the relationships under study, but that standard normative accounts of scientific inquiry have assumed away this problem. At the same time, we show that inference and discovery methods can “protect” themselves in various ways against this possibility by using methods with the novel methodological virtue of “diligence.” Unfortunately, this desirable virtue provably is incompatible with other desirable methodological virtues that are central to reliable inquiry. No scientific method can provide every virtue that we might want.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Similarly, if contexts can change, then replications of experiments may have different evidential value than is typically thought.

  2. Of course, there is agreement with those positions that models are context-relative, but even that agreement is tempered by the fact that we focus on different aspects of context. In particular, we are interested in contextual factors that can change the relationships under study, rather than ones such as the pragmatic desires or goals of the scientists. We suspect that most (epistemic or scientific) contextualists would be quite amenable to the conclusions that we reach in this paper, but we think that our primary focus is quite different from their usual concerns.

  3. Although it will not play a role in this paper, we should note that context-driven model change is framework-relative for two distinct reasons. First, context-driven model change requires that the target model change is due to a change in the context. Since every model in a framework has the same context (e.g., every model has the same set of variables), the same changes “in the world” can produce context-driven model change relative to one framework but not relative to another, depending on whether the changing factors are in the context or in the framework’s models. Second, since the target model is the “best of the bunch,” whether the target model changes can depend on the competition (i.e., the other models in the framework).

  4. This example makes the framework-relativity of context-driven model change quite clear, as there would be no change in the target model if our framework included the causal model \(Switch \rightarrow Lights \leftarrow Power\).

  5. For example, the framework might be the set of causal models: \(\{Switch \rightarrow Lights, Switch\ Lights, Switch \leftarrow Lights\}\), and \(C\) might be the \(Power\) state.

  6. Of course, there are many different issues that the cod fishery collapse illuminates, such as the fact that even central planning and control can be insufficient to prevent a tragedy of the commons. We focus here on the model change aspect, but also think that this is a rich case study that has been insufficiently studied in philosophy of science.

  7. Of course, the fish population varies from year to year, but that is not model change as we understand it. The important point here is that the relevant causal/sampling relationship does not change.

  8. Some researchers have argued that, under certain circumstances, reducing the amount of fish may actually make it easier to catch them, and so \(q\) could potentially be negative. In such contexts, the \(CPUE-fish\) relationship would be piecewise linear, rather than a single linear function.

  9. In reading Fig. 2, one must remember that \(fish\) is an unobserved variable. The \(fish\) numbers were retroactively recalculated after fishery scientists realized that there had been context-driven model change. They were not available to the scientists at the time.

  10. There actually were two scientific surveys during this time period, and they unsurprisingly contradicted the significantly higher estimated fish population. Scientists were unsure of how exactly to reconcile the conflicting measurements, so they decided simply to average the CPUE estimates with the survey data, which still grossly overestimated the population.

  11. Hansen (2001) sums up the problem nicely: “Structural change is pervasive in economic time series relationships, and it can be quite perilous to ignore. Inferences about economic relationships can go astray, forecasts can be inaccurate, and policy recommendations can be misleading or worse.” (p. 127)

  12. Of course, it is certainly possible that the accounts discussed in this section could be adjusted to handle model change. We simply aim to show that existing ones do not currently accommodate it.

  13. One might object that accounts of scientific inquiry are about theories, not models (i.e., our focus), and that this makes a meaningful difference. However, it is not clear what the relevant difference would be. Moreover, the two accounts we consider below are both supposed to apply to more focused models, as well as broader theories.

  14. In statistical language, severe tests should have both a low false positive and a low false negative rate. Falsifying tests need only have a low false negative rate (where ‘negative’ means “theory is found false”).

  15. Additionally, her account is relatively localist, as it advocates testing “small” claims (e.g., models) rather than entire scientific theories.

  16. What counts as ‘large’ or ‘frequent’ will depend partly on our inference or learning method. Informally, we need to assume that the target model does not continue to change before we can identify it with the particular inference method in use.

  17. Such anomalies are similar in spirit to Kuhnian anomalies, but are of course on a much smaller scale.

  18. The converse obviously does not hold: different models can produce the same data distributions, so context-driven model change does not necessarily lead to anomalous data. This is an instance of the general problem of underdetermination of models by observed data that affects all model inference methods, including ones that do not accommodate context-driven model changes.

  19. It is tempting to think that one should always throw away all previous data whenever context-driven model change occurs. However, this strategy potentially throws away useful data that could lead one to the target model more rapidly, and also does not allow for the possibility of retracting an anomaly detection judgment.

  20. We conjecture that the general “consistency vs. diligence” tension that we discuss below arises for essentially all methods of scientific inference, but only have a formal proof for statistical estimators. It is an open research question whether this incompatibility can be proven for a broader class of methods, though we note that an enormous part of scientific inquiry consists in statistical estimation. In general, we suspect that part of the reason that normative accounts of science have ignored context-driven model change is precisely because they privilege consistency, and so cannot value diligence in the same way.

  21. This statement is not quite right: it is possible for consistent estimators to be diligent, but only under very special conditions given in “Construction: diligence \(\Rightarrow \lnot \) arbitrary errors” in Appendix section. Roughly, an estimator can be both consistent and diligent only when every model in the framework is sufficiently far away from every other model, so that certain data guarantee that model change has occurred. Few realistic frameworks satisfy this condition, though many toy ones do. For example, a framework for deterministic data generating processes with no measurement error meets this condition, as can (sometimes) a framework for deterministic data generating processes with bounded measurement errors.

  22. The proof and precise statements of the above notions are provided in the Appendix.

  23. Of course, there are estimators that are neither consistent nor diligent, but we ignore those here.

References

  • Blanchard, O. J., & Simon, J. A. (2001). The long and large decline in US output volatility. Brookings Papers on Economic Activity, 135–164, 2001.

    Google Scholar 

  • Carpenter, S. R., Cole, J. J., Pace, M. L., Batt, R., Brock, W. A., Cline, T., et al. (2011). Early warnings of regime shofts: A whole-ecosystem experiment. Science, 332, 1079–1082.

    Article  Google Scholar 

  • Cartwright, N. (1983). How the laws of physics lie. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Cartwright, N. (2007). Hunting causes and using them: Approaches in philosophy and economics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Clements, M. P., & Hendry, D. F. (1998). Forecasting economic time series. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Clements, M. P., & Hendry, D. F. (1999). Forecasting non-stationary economic time series. Cambridge: MIT Press.

    Google Scholar 

  • Cooper, R. M., & Zubek, J. P. (1958). Effects of enriched and restricted early environments on the learning ability of bright and dull rats. Canadian Journal of Psychology, 12, 159–164.

    Article  Google Scholar 

  • Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. The Annals of Statistics, 25(1), 1–37.

    Article  Google Scholar 

  • Dahlhaus, R., & Polonik, W. (2009). Empirical spectral processes for locally stationary time series. Bernoulli, 15(1), 1–39.

    Article  Google Scholar 

  • Davies, J., & Davies, D. (2010). Origins and evolution of antibiotic resistance. Microbiology and Molecular Biology Reviews, 74(3), 417–433.

    Article  Google Scholar 

  • Davis, R. A., Lee, T., & Rodriguez-Yam, G. (2006). Structural break estimation for nonstationary time series models. Journal of American Statistical Association, 101, 229–239.

    Google Scholar 

  • Davis, R. A., & Rodriguez-Yam, G. (2008). Break detection for a class of nonlinear time series models. Journal of Time Series Analysis, 29, 834–867.

    Article  Google Scholar 

  • Duffy, M. A., & Sivars-Becker, L. (2007). Rapid evolution and ecological hostparasite dynamics. Ecology Letters, 10, 44–53.

    Article  Google Scholar 

  • Earman, J. (1992). Bayes or bust? A critical examination of Bayesian confirmation theory. Cambridge: The MIT Press.

    Google Scholar 

  • Edmonds, B. (2007). The practical modelling of context-dependent causal processes. Chemistry and Biodiversity, 4, 2386–2395.

    Article  Google Scholar 

  • Eells, E. (1991). Probabilistic causality. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Estes, J. A. (2011). Trophic downgrading of planet earth. Science, 333, 301–306.

    Article  Google Scholar 

  • Finlayson, A. C. (1994). Fishing for truth: A sociological analysis of northern cod stock assessments from 1977–1990. Institute of Social and Economic Research, Memorial University of Newfoundland.

  • Glymour, B. (2008). Stable models and causal explanation in evolutionary biology. British Journal for the Philosophy of Science, 59, 835–855.

    Article  Google Scholar 

  • Glymour, B. (2011). Modeling environments: Interactive causation and adaptations to environmental conditions. Philosophy of Science, 78(3), 448–471.

    Article  Google Scholar 

  • Hansen, B. E. (2001). The new econometrics of structural change: dating breaks in US labor productivity. The Journal of Economic Perspectives, 15(4), 117–128.

    Article  Google Scholar 

  • Howson, C., Urbach, P. (1993). Scientific reasoning: The Bayesian approach. Open Court, second edition, 1993. Original work published 1989.

  • Kurlansky, M. (1997). Cod: A biography of the fish that changed the world. New York: Walker and Company.

    Google Scholar 

  • Lucas, R. E., Mcgrattan, E., Phelan, C., Prescott, E., Rossi-hansberg, E., Sargent, T., et al. (2003). Macroeconomic priorities. American Economic Review, 93, 1–14.

    Article  Google Scholar 

  • Mayo, D. G. (1991). Novel evidence and severe tests. Philosophy of Science, 58, 523–552.

    Article  Google Scholar 

  • Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago: University of Chicago Press.

    Book  Google Scholar 

  • Mayo, D. G. (1997). Severe tests, arguing from error, and methodological underdetermination. Philosophical Studies, 86, 243–266.

    Article  Google Scholar 

  • Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a neymanpearson philosophy of induction. British Journal for the Philosophy of Science, 57, 323–357.

    Article  Google Scholar 

  • McGuire, T. (1997). The last northern cod. Journal of Political Ecology, 4, 41–54.

    Google Scholar 

  • Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press.

    Google Scholar 

  • Perron, P. (2006). Dealing with structural breaks. In K. Patterson & T. C. Mills (Eds.), Palgrave handbook of econometrics, Vol. 1 (pp. 278–352). Basingstoke: Palgrave Macmillan.

    Google Scholar 

  • Popper, K. (1963). Conjectures and refutations Vol.28. London: Routledge.

    Google Scholar 

  • Sarkar, S., & Fuller, T. (2003). Generalized norms of reaction for ecological developmental biology. Evolution and Development, 5, 106–115.

    Article  Google Scholar 

  • Scheffer, M., & Carpenter, S. R. (2003). Catastrophic regime shifts in ecosystems: Linking theory to observation. Trends in Ecology and Evolution, 18, 648–656.

    Article  Google Scholar 

  • Scheiner, S. M. (1993). Genetics and evolution of phenotypic plasticity. Annual Review of Ecology and Systematics, 24, 35–68.

    Article  Google Scholar 

  • Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge: MIT Press.

    Google Scholar 

  • Yoshida, T., Ellner, S. P., Jones, L. E., Bohannan, B. J. M., & Lenski, R. E. (2007). Cryptic population dynamics: Rapid evolution masks trophic interactions. PLoS Biology, 5, 1868–1879.

    Article  Google Scholar 

  • Yoshida, T., Jones, L. E., Ellner, S. P., Fussmann, G. F., & Hairston, N. G, Jr. (2003). Rapid evolution drives ecological dynamics in a predator-prey system. Nature, 424, 303–306.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erich Kummerfeld.

Appendices

Appendix 1: Notation

Let \(X\) represent a random sequence of data. Let \(X_B^t\) represent a random subsequence of length \(t\) of data generated from distribution \(B\). Let \(\mathbf{F}\) be a framework (in this case, a set of distributions). Let \(M_\mathbf{F}\) be a method that takes a data sequence \(X\) as input and outputs a distribution \(B \in \mathbf{F}\); we will typically drop the subscript \(\mathbf{F}\) from \(M\) as we will be dealing with a single framework at a time. Concretely, \(M[X_B^t]=O\) means that \(M\) outputs \(O\) after observing the sequence \(X_B^t\). Let \(D\) be a distance metric over distributions (e.g., the Anderson-Darling test). Let \(D_\delta (A,B)\) be shorthand for the following inequality: \(D(A,B)<\delta \). Finally, let \([X,Y]\) denote the concatenation of sequence \(X\) with sequence \(Y\).

Definition

A distribution \(A\) is absolutely continuous with respect to another distribution \(B\) iff \(\forall x\ P_{B}(x)=0 \Rightarrow P_{A}(x)=0\). That is, if \(B\) gives probability \(0\) to some event \(x\), then \(A\) also gives probability \(0\) to that same event. Let \(AC(A)\) be the set of distributions which are absolutely continuous with respect to \(A\) except for \(A\) itself.

Definition

An estimator \(M\) is consistent if \(\forall B\in \mathbf{F}\ \forall \delta >0 \lim _{n\rightarrow \infty } P(D_\delta (M[X_B^t],B))\rightarrow 1\). That is, for all distributions in the framework, the probability that \(M\)’s output is arbitrarily close to the target distribution approaches 1 as the amount of data increases to infinity.

Definition

An estimator \(M\) can be forced to make arbitrary errors if \(\forall B_1\in \mathbf{F}\ \forall B_2\in AC(B_1)\cap \mathbf{F}\ \forall \delta , \epsilon >0\ \forall n_2 \exists n_1 P(D_\delta (M[X_{B_1}^{n_1},X_{B_2}^{n_2}],B_2))\le \epsilon \). That is, consider any distribution \(B_2\) which is in the framework and is absolutely continuous with respect to \(B_1\). Then for any amount of data \(n_2\) from \(B_2\), there is an amount of data \(n_1\) from \(B_1\) such that \(M\)’s output will still be arbitrarily unlikely to be arbitrarily close to \(B_2\) after seeing the \(n_1 + n_2\) data.

Appendix 2: Lemma: consistency \(\Rightarrow \) arbitrary errors (within AC)

Proof

Assume \(M\) is consistent. It suffices to show that:

$$\begin{aligned} \forall B_1 \in \mathbf{F}\ \forall n_2 >0\ \forall B_2\in AC(B_1)\cap \mathbf{F}\ \forall \delta >0\ \forall \epsilon <1\ \exists n_1 P(D_\delta (M[X_{B_1}^{n_1},X_{B_2}^{n_2}],B_1))>\epsilon \end{aligned}$$

That is, even if we add a finite sequence of data drawn from \(B_2\) to the end of any \(X_{B_1}^{n_1}\) sequence, then there is some amount of \(B_1\) data so that the estimator \(M\) still converges to \(B_1\).

Choose arbitrary \(B_1\), \(B_2\) and \(n_2\). Let \(S\) be the set of all events in the metric space that, if satisfied by \(X_{B_2}^{n_2}\), would stop \(M\) from converging to \(B_1\). That is, let \(S\) be the set of all events that, if satisfied by \(X_{B_2}^{n_2}\), would entail the negation of:

$$\begin{aligned} \forall \delta >0\ \forall \epsilon <1 \ \exists n_1P(D_\delta (M[X_{B_1}^{n_1},X_{B_2}^{n_2}],B_1))>\epsilon \end{aligned}$$

Since \(M\) is consistent for \(B_1\), then \(P(X_{B_1}^{n_1}\in S)=0\). Since \(B_2\) is absolutely continuous with respect to \(B_1\), \(P(X_{B_2}^{n_2}\in S)=0\). As such, it is at most a probability \(0\) event that \(X_{B_2}^{n_2}\) can take a value that prevents \(M\) from converging to \(B_1\), so \(M\) will still converge in probability to \(B_1\) over sequences of the form \([X_{B_1}^{n_1},X_{B_2}^{n_2}]\). \(\square \)

Appendix 3: Construction: diligence \(\Rightarrow \lnot \) arbitrary errors

We construct the formal definition of diligence from that of “arbitrary errors” (AE) in a way that makes it clear that diligent methods are not subject to arbitrary errors. The negation of AE is:

$$\begin{aligned} \exists B_1 \in \mathbf{F}\ \exists B_2\in AC(B_1)\cap \mathbf{F}\ \exists \delta >0\ \exists \epsilon <1\ \exists n_2 \forall n_1 P(D_\delta (M[X_{B_1}^{n_1},X_{B_2}^{n_2}],B_2))> \epsilon \end{aligned}$$

This condition is, however, insufficiently weak to capture diligence, as we want to avoid such errors for all pairs of distributions in the framework, not just for some absolutely continuous pair. We thus strengthen the negation of AE by turning the two leading existential quantifiers into universal quantifiers and extending the domain of the universal quantifier over \(B_2\) to include those distributions which are not absolutely continuous with respect to \(B_1\):

Definition

An estimator \(M\) is diligent if

$$\begin{aligned} \forall B_1\in \mathbf{F}\ \forall B_2\in \mathbf{F}\backslash B_1\ \forall \delta >0\ \exists \epsilon >0\ \exists n_2 \forall n_1 P(D_\delta (M[X_{B_1}^{n_1}, X_{B_2}^{n_2}], B_2)) > \epsilon . \end{aligned}$$

That is, for any pair of distributions in the framework, there is an amount of data \(n_2\) from \(B_2\) such that \(M\)’s output will be arbitrarily close to \(B_2\) with positive probability after seeing \(n_1 + n_2\) data, for any amount of data \(n_1\) from \(B_1\).

Definition

A framework \(\mathbf{F}\) is nontrivial iff there exists some \(B \in \mathbf{F}\) such that \(AC(B)\cap \mathbf{F} \ne \emptyset \).

Clearly, diligence implies the negation of AE for all nontrivial frameworks. We thus have the key theorem for this paper:

Theorem

No statistical estimator for a (nontrivial) framework is both consistent and diligent.

Proof

Assume \(M\) is both consistent and diligent. Its consistency implies that AE holds for it. Its diligence, along with the nontriviality of the framework, implies that \(\lnot \)AE holds for it. Contradiction, and so no \(M\) can be both consistent and diligent for a nontrivial framework. \(\square \)

Appendix 4: Generalizing diligence

A natural generalization of diligence yields a novel methodological virtue: Uniform Diligence. Uniform diligence is a strengthening of regular (pointwise) diligence in the same way that uniform consistency is a strengthening of pointwise consistency. Instead of requiring only that, for each \(B_1, B_2\) and \(\delta \), there be some \(n_2\), Uniform Diligence requires that there be some \(n_2\) which works for all such combinations.

Definition

An estimator \(M\) is uniformly diligent if

$$\begin{aligned} \exists n_2 \forall B_1\in \mathbf{F}\ \forall B_2\in \mathbf{F}\backslash B_1\ \forall \delta >0\ \exists \epsilon >0\ \forall n_1 P(D_\delta (M[X_{B_1}^{n_1},X_{B_2}^{n_2}],B_2))> \epsilon . \end{aligned}$$

Obviously, consistency and uniform diligence are also incompatible, as the latter is a strengthening of diligence. The following chart shows three different ways of ordering the quantifiers in the definition of Diligence, producing methodological virtues of varying strength. The weakest, Responsiveness, is not incompatible with consistency. For space and clarity, B is used in place of \(\forall B_1\in \mathbf{F}\ \forall B_2\in \mathbf{F}\backslash B_1\ \forall \delta >0\ \exists \epsilon >0\).

Responsiveness

Diligence

Uniform diligence

\(\mathbf{B} \forall n_1 \exists n_2\)

\(\mathbf{B} \exists n_2 \forall n_1\)

\(\exists n_2 \mathbf{B} \forall n_1\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kummerfeld, E., Danks, D. Model change and reliability in scientific inference. Synthese 191, 2673–2693 (2014). https://doi.org/10.1007/s11229-014-0408-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11229-014-0408-3

Keywords

Navigation