Abstract
Arguments that medical decision making should rely on a variety of evidence often begin from the claim that meta-analysis has been shown to be problematic. In this paper, I first examine Stegenga’s (Stud Hist Philos Sci Part C Stud Hist Philos Biol Biomed Sci 42:497–507, 2011) argument that meta-analysis requires multiple decisions and thus fails to provide an objective ground for medical decision making. Next, I examine three arguments from social epistemologists that contend that meta-analyses are systematically biased in ways not appreciated by standard epistemology. In most cases I show that critiques of meta-analysis fail to account for the full range of meta-analytic procedures. In the remainder of cases, I argue that the critiques identify problems that do not uniquely cut against meta-analysis. I close by suggesting one reason why it may be pragmatically rational to violate the principle of total evidence and by outlining the criteria for a successful argument against meta-analysis. A set of criteria I contend remain unmet.
Access this article
Rent this article via DeepDyve
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11229-018-1690-2/MediaObjects/11229_2018_1690_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11229-018-1690-2/MediaObjects/11229_2018_1690_Fig2_HTML.gif)
Similar content being viewed by others
Notes
What these arguments share is the claim that RCTs fail to provide the theoretical knowledge needed to extrapolate from RCT evidence to a proposed intervention, for discussion of internal differences in these views see Dragulinescu (2012).
Presumably, if a principled reason can be given for one choice over the other, then constraint is not the problem. Trivially, I could “choose” to miscalculate the effect size and reach a wildly different conclusion than others, but no one would think that such a choice would threaten the reliability of meta-analysis. A method lacks constraint only if its proper application fails to yield reasonably similar results.
Active placebos are substances that cause side-effects similar to a drug, but which are not effective treatments.
In other words, the presence or absence of a side-effects alter a patient’s beliefs about whether they are receiving treatment. The third hypothesis proposed that active placebos and “antidepressants” are more effective than inert placebos in causing patients to believe they are receiving treatment, amplifying the placebo effect. In other words, it claims that what we consider to be antidepressants are themselves nothing more than active placebos.
I am grateful to an anonymous reviewer for pushing me on this point.
In order for a result that is consistent with a theory to provide evidential support, it has to arise from a test of that theory (Laudan 1990, pp. 61–65). Even if the data are equally consistent with both the theories put forward by Fountoulakis and Möller (2011) and by Kirsch et al. (2008), the data do not provide evidential support for Fountoulakis and Mollër because their theory was created to be consistent with the data; whereas, Kirsch et. al.’s theory was tested by the data and thus derive evidential support from this and other previous tests.
While this may be true of theory choice, we may nevertheless have non-epistemic reasons to eliminate certain treatment options from ethical medical practice. At some point further human experimentation becomes ethically prohibited because the expectation of benefit becomes too small to justify the expected risks.
The adequacy of inert placebos has previously been brought to bear upon the philosophical disputes over the nature of a placebo (Holman 2015; Howick 2017). Though I have suggested a statistical procedure as an alternative to active placebos, the point here is that no further trials are likely to be conducted that use active placebos as a control.
I use the terms “proponents” and “critics” somewhat cautiously; however, both teams of researchers had previously published papers either supporting or questioning the use of antidepressants. More, over a survey of articles that cite the studies suggests that there may be disciplinary considerations at play. Psychiatrists tend to cite the Quitkin et al. (2000) study in support of the use of antidepressants while psychologists, whose form of therapy can be seen as a market competitor to pharmaceutical treatment, tend to cite Moncrieff et al. (2004). As such it seems that disciplinary allegiance is a fairly good predictor of which study is seen as definitive.
Though unstated, their coding decisions can be reconstructed on a case-by-case basis by comparing the data provided with the original articles. In the quantitative measures it seems that they have used 50% improvement from baseline. However, there does not seem to be a uniform principle that guides the decisions on the various clinical impression scales. In some cases they consider the top two possible ratings as a “responder” while in another, they count only the top response. Another possible scheme for coding is the qualitative descriptions in the studies, but these too are applied inconsistently. For example, in one study a participant that a clinician rated as “moderately improved” was coded as a responder and in another study such a rating was coded as a non-responder.
In personal communication, Jukola notes that while tools like PRISMA may resolve some disputes, if quality assessment tools require intractable and arbitrary choices, then there would still be a violation of procedural objectivity. I agree that this is a possibility, but leave it as an open question as to whether such choices are required, and if so, whether they regularly lead to divergent conclusions. I revisit this point in Sect. 3.
Formally, from the central limit theorem it can be shown that the error of the estimate decreases as the sample size increases. Thus, small trials will show a large range of effect sizes.
The graphs were created by extracting data from online supplementary material (Turner et al. 2008). The sample sizes are taken from Appendix Table A and the effect sizes from Appendix Table C. For studies containing multiple doses, the sample size is the sum of all the sub-groups, the effect size is a weighted average of the sub-group effects. Note that Table 2 is not simply Table 1 with the unpublished studies removed. The published studies also differed in other ways such as which outcome measures were reported (or suppressed). Nevertheless the effect of publication bias is still plain in the reported data.
This is not invariably so; in the next section I discuss a threat to the reliability of meta-analysis that cannot be statistically corrected for.
The agents in our model conduct a series of experiments testing the efficacy of a treatment and update their beliefs on the basis of both their experiments, and the results of their peers. Functionally speaking the agents in our model are Bayesian agents, but the manner in which the agents update approaches the result of a frequentist meta-analysis quite quickly given the weak initial strength of the agents’ priors. Though our concern is primarily with the ability of industry-funding to promote biased measurements, a straightforward corollary of our results is that meta-analyses will yield biased estimates in some circumstances. As noted below, though they develop the concern in less detail, both Jukola (2017) and Stegenga (2015) explicitly cite problems with measurements in devaluing arguments.
An example of a retrospective analysis is provided by DES, one of the largest drug disaster of the late 1970s. Though it was widely prescribed between 1950 and 1970 to prevent miscarriage, reliance on meta-analytic evidence would have prevented its use after 1954 (Bamigboye and Morris 2003). For additional cases used explicitly in an argument for meta-analysis see Chalmers (2005). It might be objected that though such case series begin to establish a track record, what is strictly demanded is comprehensive analysis of all treatments; otherwise, it is possible that supporting instances have been cherry-picked to provide supporting evidence. This enormous undertaking is exactly what was provided, sub-discipline by sub-discipline, in The Oxford Database of Perinatal Trials, Effective Care in Pregnancy and Childbirth, A Guide to Effective Care in Pregnancy and Childbirth, and Effective Care of the Newborn Infant. Others would subsequently follow (see Chalmers et al. (1997) for important milestones).
References
Antonuccio, D., Burns, D., & Danton, W. (2002). Antidepressants: A triumph of marketing over science? Prevention and Treatment, 5, 25.
Antonuccio, D., Danton, W., DeNelsky, G., Greenberg, R., & Gordon, J. (1999). Raising questions about antidepressants. Psychotherapy and Psychosomatics, 68, 3–14.
Bamigboye, A. A., & Morris, J. (2003). Oestrogen supplementation, mainly diethylstilbestrol, for preventing miscarriages and other adverse pregnancy outcomes. Cochrane Database of Systematic Reviews, 2003(3), CD004353.
Biddle, J. (2013). State of the field: Transient underdetermination and values in science. Studies in History and Philosophy of Science Part A, 44, 124–133.
Broadbent, A. (2011). Inferring causation in epidemiology: Mechanisms, black boxes, and contrasts. In P. Illari McKay, F. Russo, & J. Williamson (Eds.), Causality in the sciences (pp. 45–69). Oxford: Oxford University Press.
Brown, W. A. (2002). Are antidepressants as ineffective as they look? Prevention and Treatment, 5, 24c.
Cartwright, N. (2009). What is this thing called “efficacy”? In C. Mantzavinos (Ed.), Philosophy of the social sciences: Philosophical theory and scientific practice (pp. 185–206). Cambridge: Cambridge University Press.
Cartwright, N. (2011). A philosopher’s view of the long road from RCTs to effectiveness. The Lancet, 377, 1400–01.
Chalmers, I. (1991). Electronic publication of continuously updated overviews (meta-analyses) of controlled trials. International Society of Drug Bulletins Review, 1, 15–18.
Chalmers, I. (2005). The scandalous failure of scientists to cumulate scientifically. In Abstract to paper presented at: Ninth World Congress on Health Information and Libraries (pp. 20–23).
Chalmers, I., Sackett, D., & Silagy, (1997). In A. Maynard & I. Chalmers (Eds.), Non-random Reflections on Health Services research: On the 25th Anniversary of Archie Cochrane’s Effectiveness and Efficiency (pp. 231–249). London: BMJ Publishing.
Clarke, B., Gillies, D., Illari, P., Russo, F., & Williamson, J. (2014). Mechanisms and the evidence hierarchy. Topoi, 33, 339–360.
Clarke, M., Hopewell, S., & Chalmers, L. (2007). Reports of clinical trials should begin and end with up-to- date systematic reviews of other relevant evidence: A status report. Journal of the Royal Society of Medicine, 100, 187–190.
Cochrane, A. L. (1972). Effectiveness and efficiency: Random reflections on health services. Oxford: Oxford University Press.
Dragulinescu, S. (2012). On ‘stabilising’ medical mechanisms, truth-makers and epistemic causality: A critique to Williamson and Russo’s approach. Synthese, 187, 785–800.
Elias, M. (2002) Study: Antidepressant barely better than placebo. USA Today. https://usatoday30.usatoday.com/news/health/drugs/2002-07-08-antidepressants.htm.
Fergusson, D., Glass, K. C., Hutton, B., & Shapiro, S. (2005). Randomized controlled trials of aprotinin in cardiac surgery: Could clinical equipoise have stopped the bleeding? Clinical Trials, 2, 218–232.
Fountoulakis, K. N., & Möller, H. J. (2011). Efficacy of antidepressants: A re-analysis and re-interpretation of the Kirsch data. International Journal of Neuropsychopharmacology, 14, 405–412.
Friedman, A., Granick, S., Cohen, H., & Cowitz, B. (1966). Imipramine (Tofranil) vs. placebo in hospitalised psychotic depressives. Journal of Psychiatric Research, 4, 13–36.
Furberg, C. D. (1983). Effect of antiarrhythmic drugs on mortality after myocardial infarction. The American Journal of Cardiology, 52(6), C32–C36.
Gilbert, R., Salanti, G., Harden, M., & See, S. (2005). Infant sleeping position and the sudden infant death syndrome: Systematic review of observational studies and historical review of recommendations from 1940 to 2002. International Journal of Epidemiology, 34, 874–887.
Goldman, A. (1999). Knowledge in a social world. New York, NY: Oxford University Press.
Grim, P., Rosenberger, R., Rosenfeld, A., Anderson, B., & Eason, R. E. (2013). How simulations fail. Synthese, 190, 2367–2390.
Healy, D. (2012). Pharmageddon. Berkeley: University of California Press.
Hergovich, A., Schott, R., & Burger, C. (2010). Biased evaluation of abstracts depending on topic and conclusion: Further evidence of a confirmation bias within scientific psychology. Current Psychology, 29, 188–209.
Hine, L. K., Laird, N., Hewitt, P., & Chalmers, T. C. (1989). Meta-analytic evidence against prophylactic use of lidocaine in acute myocardial infarction. Archives of Internal Medicine, 149, 2694–2698.
Hollister, L., Overall, J., Johnson, M., Pennington, V., Katz, G., & Shelton, J. (1964). Controlled comparison of Imipramine, Amitriptyline and placebo in hospitalised depressed patients. Journal of Nervous and Mental Disease, 139, 370–375.
Holman, B. (2015). Why most sugar pills are not placebos. Philosophy of Science, 82, 1330–1343.
Holman, B. (2017). Philosophers on drugs. Synthese. https://doi.org/10.1007/s11229-017-1642-2.
Holman, B., & Bruner, J. (2017). Experimentation by industrial selection. Philosophy of Science, 84, 1008–1019.
Horwitz, R. I., & Feinstein, A. R. (1981). Improved observational method for studying therapeutic efficacy: Suggestive evidence that lidocaine prophylaxis prevents death in acute myocardial infarction. JAMA, 246, 2455–2459.
Howick, J. (2012). The philosophy of evidenced-based medicine. West Sussex: British Medical Journal Books.
Howick, J. (2017). The relativity of ‘placebos’: Defending a modified version of Grünbaum’s definition. Synthese, 194, 1363–1396.
Jukola, S. (2015). Meta-analysis, ideals of objectivity, and the reliability of medical knowledge. Science and Technology Studies, 28, 101–120.
Jukola, S. (2017). On ideals of objectivity, judgment, and bias in medical research-A comment on Stegenga. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biology and Biomedical Sciences, 62, 35–41.
Kirsch, I. (1998a). Reducing noise and hearing placebo more clearly. Prevention and Treatment, 1, 7r.
Kirsch, I. (1998b). On the importance of reading carefully: A response to Klein. Treatment and Prevention, 1, 9r.
Kirsch, I., Deacon, B. J., Huedo-Medina, T. B., Scoboria, A., Moore, T. J., & Johnson, B. T. (2008). Initial severity and antidepressant benefits: A meta-analysis of data submitted to the Food and Drug Administration. PLoS Medicine, 5, e45.
Kirsch, I., Moore, T. J., Scoboria, A., & Nicholls, S. S. (2002). The emperor’s new drugs: An analysis of antidepressant medication data submitted to the U.S. Food and Drug Administration. Prevention and Treatment, 5, 23.
Kirsch, I., Scoboria, A., & Moore, T. (2002). Antidepressants and placebos: Secrets, revelations, and unanswered questions. Prevention and Treatment, 5, 33.
Kirsch, I., & Sapirstein, G. (1998). Listening to Prozac but hearing placebo: A meta-analysis of antidepressant medication. Prevention and Treatment, 1, 2a.
Kitcher, P. (1993). The advancement of science: Science without legend, objectivity without illusions. Oxford: Oxford University Press.
Klein, D. F. (1998a). Listening to meta-analysis but hearing bias. Prevention and Treatment, 1, 6c.
Klein, D. F. (1998b). Reply to Kirsch’s rejoinder regarding antidepressant meta-analysis. Treatment and Prevention, 1, 8r.
Koehler, J. J. (1993). The influence of prior beliefs on scientific judgments of evidence quality. Organizational Behavior and Human Decision Processes, 56, 23–55.
Kourany, J. A. (2010). Philosophy of science after feminism. New York, NY: Oxford University Press.
Lakatos, I. (1978). The methodology of scientific research programmes Volume 1: Philosophical papers (Vol. 1). Cambridge: Cambridge University Press.
Landes, J., Osimani, B., Poellinger, R. (Forthcoming). Epistemology of Causal inference in Pharmacology Towards a Framework for the Assessments of harm. European Journal of Philosophy of Science.
Laudan, L. (1990). Science and relativism: Some key controversies in the philosophy of science. Chicago, IL: University of Chicago Press.
Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P., et al. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. PLoS Medicine, 6(7), e1000100.
Longino, H. E. (1990). Science as social knowledge: Values and objectivity in scientific inquiry. Princeton, NJ: Princeton University Press.
Longino, H. E. (2002). The fate of knowledge. Princeton, NJ: Princeton University Press.
Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37, 2098.
Moerman, D. E. (2002). The loaves and the fishes: A comment on The emperor’s new drugs: An analysis of antidepressant medication data submitted to the US Food and Drug Administration. Prevention and Treatment, 5, 29.
Moher, D., Tetzlaff, J., Tricco, A. C., Sampson, M., & Altman, D. G. (2007). Epidemiology and reporting characteristics of systematic reviews. PLoS Med, 4, e78.
Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy and Research, 1, 161–175.
Moncrieff, J. (2001). Are antidepressants overrated? A review of methodological problems in antidepressant trials. The Journal of Nervous and Mental Disease, 189, 288–295.
Moncrieff, J., Wessely, S., & Hardy, R. (1998). Meta-analysis of trials comparing antidepressants with active placebos. The British Journal of Psychiatry, 172, 227–231.
Moncrieff, J., Wessely, S., & Hardy, R. (2004). Active placebos versus antidepressants for depression. New York City: The Cochrane Library.
Montgomery, S. A. (1994). Clinically relevant effect sizes in depression. European Neuropsychopharmacology, 4, 283–284.
Moore, T. (1995). Deadly medicines: Why tens of thousands of heart patients died in America’s worst drug disaster. New York, NY: Simon and Schuster.
Murray, E. (1989). Measurement issues in the evaluation of psychopharmacological therapy. In S. Fisher & R. P. Greenberg (Eds.), The limits of biological treatments for psychological distress (pp. 39–68). Hillsdale, NJ: Erlbaum.
Quitkin, F. M., Rabkin, J. G., Gerald, J., Davis, J. M., & Klein, D. F. (2000). Validity of clinical trials of antidepressants. American Journal of Psychiatry, 157, 327–337.
Romero, F. (2016). Can the behavioral sciences self-correct? A social epistemic study. Studies in History and Philosophy of Science Part A, 60, 55–69.
Russo, F., & Williamson, J. (2007). Interpreting causality in the health sciences. International Studies in the Philosophy of Science, 21, 157–170.
Russo, F., & Williamson, J. (2011). Epistemic causality and evidence-based medicine. History and Philosophy of the Life Sciences, 33, 563–581.
Salamone, J. D. (2002). Antidepressants and placebos: Conceptual problems and research strategies. Prevention and Treatment, 5, 24c.
Senn, S. (2003). Disappointing dichotomies. Pharmaceutical Statistics, 2, 239–240.
Senn, S. (2015). Mastering variation: Variance components and personalised medicine. Statistics in Medicine, 35, 966–977.
Sklar, L. (1975). Methodological conservatism. Philosophical Review, 84, 384–400.
Stanford, K. (2006). Exceeding our grasp: Science, history, and the problem of unconceived alternatives. New York: Oxford University Press.
Starr, M., Chalmers, I., Clarke, M., & Oxman, A. D. (2009). The origins, evolution, and future of the cochrane database of systematic reviews. International Journal of Technology Assessment in Health Care, 25(S1), 182–195.
Stegenga, J. (2011). Is meta-analysis the platinum standard of evidence? Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 42, 497–507.
Stegenga, J. (2015). Measuring effectiveness. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 54, 62–71.
Stegenga, J., Graham, A., Kennedy, S. T., Jukola, S., & Bluhm, R. (2016). New directions in philosophy of medicine. The Bloomsbury Companion to Contemporary Philosophy of Medicine, 343, 23.
Thase, M. E. (2002). Antidepressant effects: The suit may be small, but the fabric is real. Prevention and Treatment, 5, 32c.
Turner, E., Matthews, A., Linardatos, E., Tell, R., & Rosenthal, R. (2008). Selective publications of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine, 358, 252–260.
Uhlenhuth, E., & Park, L. (1964). The influence of medication (Imipramine) and doctor in relieving depressed psychoneurotic outpatients. Journal of Psychiatric Research, 2, 101–122.
van Assen, M. A., van Aert, R., & Wicherts, J. M. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20, 293.
van Aert, R. C., Wicherts, J. M., & van Assen, M. A. (2016). Conducting meta-analyses based on p values: Reservations and recommendations for applying p-uniform and p-curve. Perspectives on Psychological Science, 11, 713–729.
Wilson, I., Vernon, J., Guin, T., & Sandifer, M. (1963). A controlled study of treatment of depression. Journal of Neuropsychiatry, 4, 331–337.
Acknowledgements
This paper grew out of an extended conversation with Jacob Stegenga, indeed the first draft of Sect. 1 was completed at his dining room table. I owe him a debt of gratitude for housing me during my time in Cambridge and for a number of lengthy discussions on the topic (which of course is not to saddle him with either the views expressed in the paper or any remaining errors). I also would like to thank Irving Kirsch and Saana Jukola for responding to queries and helping me clarify my understanding of their work. Finally, I would to thank the two blind reviewers for their comments on this paper. One reviewer in particular forced me to resolve central ambiguities in the argument that I sensed were problematic, but couldn’t see my way around without their framing of the problem. The paper improved immensely because of their challenging and insightful objections.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Holman, B. In defense of meta-analysis. Synthese 196, 3189–3211 (2019). https://doi.org/10.1007/s11229-018-1690-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-018-1690-2