Abstract
The journal, Basic and Applied Social Psychology, banned null hypothesis significance testing and confidence intervals. Was this justified, and if so, why? I address these questions with a focus on the different types of assumptions that compose the models on which p-values and confidence intervals are based. For the computation of p-values, in addition to problematic model assumptions, there also is the problem that p-values confound the implications of sample effect sizes and sample sizes. For the computation of confidence intervals, in contrast to the justification that they provide valuable information about the precision of the data, there is a triple confound involving three types of precision. These are measurement precision, precision of homogeneity, and sampling precision. Because it is possible to estimate all three separately, provided the researcher has tested the reliability of the dependent variable, there is no reason to confound them via the computation of a confidence interval. Thus, the ban is justified both with respect to null hypothesis significance testing and confidence intervals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
An example is the book by Briggs (2016), who is a distinguished participant at TES2019.
- 2.
Richard Morey, in his blog (http://bayesfactor.blogspot.com/2015/11/neyman-does-science-part-1.html), has documented how even Neyman was unable to avoid misusing p-values in this way, though he warned against it himself.
- 3.
In fact, Rothman et al. (2013) provided arguments against random selection.
- 4.
The reader may wonder about p-values as used in NHST versus as used to provide continuous indices of alleged justified worry about the model. Although both are problematic for the reasons described, null hypothesis significance tests are worse because of the dichotomous thinking they encourage, and the dramatic overestimates of effect sizes in scientific literatures that they promote (see Locascio, 2017a for an explanation). If p-values were calculated but not used to draw any conclusions, their costs would be reduced though still without providing any added benefits.
- 5.
Of course, even this very limited conclusion depends on the model being correct, and as we already have seen, the model is not correct because of problematic inferential assumptions.
- 6.
Assuming random sampling, an assumption most likely incorrect.
- 7.
This argument should not be interpreted as indicating that contemporary researchers are at an overall disadvantage. In fact, contemporary researchers have many advantages over the researchers of yesteryear, including better knowledge, better technology, and others.
- 8.
References
Bakker, M., van Dijk, A., Wicherts, J.M.: The rules of the game called psychological science. Perspect. Psychol. Sci. 7(6), 543–554 (2012)
Berk, R.A., Freedman, D.A.: Statistical assumptions as empirical commitments. In: Blomberg, T.G., Cohen, S. (eds.) Law, Punishment, and Social Control: Essays in Honor of Sheldon Messinger. 2nd edn., pp. 235–254. Aldine de Gruyter (2003)
Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)
Briggs, W.: Uncertainty: The Soul of Modeling, Probability and Statistics. Springer, New York (2016)
Cumming, G., Calin-Jageman, R.: Introduction to the New Statistics: Estimation, Open Science, and Beyond. Taylor and Francis Group, New York (2017)
Duhem, P.: The Aim and Structure of Physical Theory (P.P. Wiener, Trans). Princeton University Press, Princeton (1954). (Original work published 1906)
Earp, B.D., Trafimow, D.: Replication, falsification, and the crisis of confidence in social psychology. Front. Psychol. 6, 1–11, Article 621 (2015)
Gillies, D.: Philosophical Theories of Probability. Routledge, London (2000)
Greenland, S.: Invited commentary: the need for cognitive science in methodology. Am. J. Epidemiol. 186, 639–645 (2017)
Gulliksen, H.: Theory of Mental Tests. Lawrence Erlbaum Associates Publishers, Hillsdale (1987)
Halsey, L.G., Curran-Everett, D., Vowler, S.L., Drummond, G.B.: The fickle P value generates irreproducible results. Nat. Methods 12, 179–185 (2015). https://doi.org/10.1038/nmeth.3288
Hubbard, R.: Corrupt Research: The Case for Reconceptualizing Empirical Management and Social Science. Sage Publications, Los Angeles (2016)
John, L.K., Loewenstein, G., Prelec, D.: Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23(5), 524–532 (2012)
Lakatos, I.: The Methodology of Scientific Research Programmes. Cambridge University Press, Cambridge (1978)
Lord, F.M., Novick, M.R.: Statistical Theories of Mental Test Scores. Addison-Wesley, Reading (1968)
Nguyen, H.T.: On evidential measures of support for reasoning with integrated uncertainty: a lesson from the ban of P-values in statistical inference. In: Huynh, V.N., et al., (eds.) Integrated Uncertainty in Knowledge Modeling and Decision Making. Lecture Notes in Artificial Intelligence, vol. 9978, pp. 3–15. Springer (2016)
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349(6251), aac4716 (2015). 10.1126/science.aac4716
Rothman, K.J., Galacher, J.E.J., Hatch, E.E.: Why representativeness should be avoided. Int. J. Epidemiol. 42(4), 1012–1014 (2013)
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11), 1359–1366 (2011)
Speelman, C.P., McGann, M.: Editorial: challenges to mean-based analysis in psychology: the contrast between individual people and general science. Front. Psychol. 7, 1234 (2016)
Trafimow, D.: Editorial. Basic Appl. Soc. Psychol. 36(1), 1–2 (2014)
Trafimow, D.: Implications of an initial empirical victory for the truth of the theory and additional empirical victories. Philos. Psychol. 30(4), 411–433 (2017a)
Trafimow, D.: Using the coefficient of confidence to make the philosophical switch from a posteriori to a priori inferential statistics. Educ. Psychol. Meas. 77(5), 831–854 (2017b)
Trafimow, D.: An a priori solution to the replication crisis. Philos. Psychol. 31, 1188–1214 (2018)
Trafimow, D.: A taxonomy of model assumptions on which P is based and implications for added benefit in the soft sciences (under submission)
Trafimow, D., Amrhein, V., Areshenkoff, C.N., Barrera-Causil, C.J., Beh, E.J., Bilgiç, Y.K., Bono, R., Bradley, M.T., Briggs, W.M., Cepeda-Freyre, H.A., Chaigneau, S.E., Ciocca, D.R., Correa, J.C., Cousineau, D., de Boer, M.R., Dhar, S.S., Dolgov, I., Gómez-Benito, J., Grendar, M., Grice, J.W., Guerrero-Gimenez, M.E., Gutiérrez, A., Huedo-Medina, T.B., Jaffe, K., Janyan, A., Karimnezhad, A., Korner-Nievergelt, F., Kosugi, K., Lachmair, M., Ledesma, R.D., Limongi, R., Liuzza, M.T., Lombardo, R., Marks, M.J., Meinlschmidt, G., Nalborczyk, L., Nguyen, H.T., Ospina, R., Perezgonzalez, J.D., Pfister, R., Rahona, J.J., Rodríguez-Medina, D.A., Romão, X., Ruiz-Fernández, S., Suarez, I., Tegethoff, M., Tejo, M., van de Schoot, R., Vankov, I.I., Velasco-Forero, S., Wang, T., Yamada, Y., Zoppino, F.C.M., Marmolejo-Ramos, F.: Manipulating the alpha level cannot cure significance testing. Front. Psychology. 9, 699 (2018)
Trafimow, D., MacDonald, J.A.: Performing inferential statistics prior to data collection. Educ. Psychol. Meas. 77(2), 204–219 (2017)
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 37(1), 1–2 (2015)
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 38(1), 1–2 (2016)
Trafimow, D., Wang, T., Wang, C.: Means and standard deviations, or locations and scales? That is the question! New Ideas Psychol. 50, 34–37 (2018)
Trafimow, D., Wang, T., Wang, C.: From a sampling precision perspective, skewness is a friend and not an enemy! Educ. Psychol. Meas. (in press)
Trueblood, J.S., Busemeyer, J.R.: A quantum probability account of order effects in inference. Cogn. Sci. 35, 1518–1552 (2011)
Trueblood, J.S., Busemeyer, J.R.: A quantum probability model of causal reasoning. Front. Psychol. 3, 138 (2012)
Valentine, J.C., Aloe, A.M., Lau, T.S.: Life after NHST: how to describe your data without “p-ing” everywhere. Basic Appl. Soc. Psychol. 37(5), 260–273 (2015)
Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p-values: context, process, and purpose. Am. Stat. 70, 129–133 (2016)
Woodside, A.: The good practices manifesto: overcoming bad practices pervasive in current research in business. J. Bus. Res. 69(2), 365–381 (2016)
Ziliak, S.T., McCloskey, D.N.: The Cult of Statistical Significance: How the Standard Error Costs us Jobs, Justice, and Lives. The University of Michigan Press, Ann Arbor (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Trafimow, D. (2019). My Ban on Null Hypothesis Significance Testing and Confidence Intervals. In: Kreinovich, V., Sriboonchitta, S. (eds) Structural Changes and their Econometric Modeling. TES 2019. Studies in Computational Intelligence, vol 808. Springer, Cham. https://doi.org/10.1007/978-3-030-04263-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-04263-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04262-2
Online ISBN: 978-3-030-04263-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)