In psychology, measurement instruments are constructed from scales, which are obtained on the grounds of exploratory and confirmatory factor analysis. Looking at the literature, one can find various recommendations regarding how these techniques should be used during the scale construction process. Some authors suggest to use exploratory factor analysis on the entire data set while others advice to perform an internal cross-validation by randomly splitting the data set in two and then either perform exploratory factor analysis on both parts or exploratory factor analysis on the first part and confirmatory factor analysis on the other. In spite of all these divergent recommendations, there is no consensus on which method yields the best result. In this paper, we analyze this issue in light of the prediction versus accommodation debate and argue that the answer to this question depends on one’s conception of the criteria that should be used to achieve the goals of the scientific enterprise.
This paper is primarily concerned with the issue of model selection during scale construction in psychology. With that being said, our analysis should not be considered as restricted only to that field. Our analysis can be extended to other disciplines that use factor analytic methods for the development of measurement instruments (e.g. sociology, education), although some idiosyncratic differences might occur.
Some items might be reversed score items.
See Hood (2013) for an analysis of the issue of realism with respect to measurement models.
In such a measurement model, the items are not used to predict the latent factors. As we will see, a factorial structure is a space where the items are the points.
Note that Penny’s and Rosie’s partitions of D can be different.
Hitchcock and Sober’s analysis takes place within the context of an instrumentalist perspective and, as such, they consider that science should aim at predictive accuracy (see Hitchcock and Sober 2004, pp. 2–3). It should be noted, however, that our analysis is independent of the realism/instrumentalism debate. As we will see, our analysis only relies on the assumption that empirical adequacy is, among others, one goal of the scientific enterprise, and that predictive accuracy is, among others, an indicator of the overall empirical adequacy of a theory.
See also Gardner (1982) for a discussion of novelty.
Conceptually, part of the unexplained variance might be due to something different than error.
Even though the hypothesis of continuity is often violated, it has been shown that, under the assumption of normality, violation of the assumption of continuity is likely negligible (cf. Byrne 2012).
There are other extraction techniques that can be used in cases where the assumption of normality is violated (cf. Flora et al. 2012). Consequently, despite their importance, these two conditions are not necessary for the use of factor analysis during scale construction.
It has been argued by Michell (1997) that, given the violation of the assumption of continuity, measurement of psychological attributes is not possible (see also Michell 2003, 2004). His argument revolves around the assumption that only continuous quantities can be measured and, since the hypothesis that psychological attributes are continuous is not tested, it follows that one is not justified to believe that it is possible to measure psychological attributes. Borsboom and Mellenbergh (2004) answered this objection and showed that this hypothesis is actually tested, though not in isolation.
It should be mentioned that some authors argued that an algorithm that searches for pure measurement models should be used rather than factor analysis techniques to obtain a measurement model (e.g. Silva et al. 2006; Kummerfeld et al. 2014; see also Murray-Watters and Glymour 2015). The analysis of these alternative techniques is beyond the scope of this paper. We will concentrate on factor analysis given its place with regard to scale construction in the psychology literature.
Once a scale has been properly validated, researchers can use different measurement models and study the relationships between the latent factors through structural equation modeling. We will leave the analysis of model selection with respect to structural models for future research.
To some extent, trying to replicate results from previous studies can be understood as a prediction that the results can be generalized.
Thus understood, replication amounts to the reproduction of a result using the same technique.
This would not be the case if one were to use principal component analysis instead of EFA.
I would like to thank Stephan Hartmann for valuable comments and suggestions made on a previous draft of this paper. I am also grateful to anonymous referees, whose comments and suggestions helped to improve this article, and to Sarah-Geneviève Trépanier, for enlightening discussions on the subject. This research was financially supported by the Social Sciences and Humanities Research Council of Canada.
