Abstract
We propose an application of a new finite mixture multinomial conditional probit (FM-MNCP) model that accommodates preference heterogeneity and explicitly accounts for utility dependencies between choice alternatives considering both local and background contrast effects. The latter is accomplished by using a one-factor structure for segment-specific covariance matrices allowing for nonzero off-diagonal covariance elements. We compare the model to a finite mixture multinomial independent probit (FM-MNIP) model that as well accommodates heterogeneity but assumes independence. That way, we address the potential benefits of a model that additionally accounts for dependencies over a model that accommodates heterogeneity only. Our model comparison is based on empirical data for smoothies and is assessed in terms of fit, holdout validation, and market share predictions. One of the main findings of our empirical study is that allowing for utility dependencies may counterbalance the effects of considering heterogeneity, and vice versa. Additional findings from a simulation study indicate that the FM-MNCP model outperforms the FM-MNIP model with respect to parameter recovery.
Similar content being viewed by others
References
Andrews RL, Ainslie A, Currim IS (2002a) An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity. J Mark Res 39:479–487
Andrews R, Ansari A, Currim I (2002b) Hierarchical Bayes versus finite mixture conjoint analysis models: a comparison of fit, prediction, and partworth recovery. J Mark Res 39:87–98
Andrews R, Currim I (2003) A comparison of segment retention criteria for finite mixture logit models. J Mark Res 40:235–243
Baur A, Klein R, Steinhardt C (2014) Model-based decision support for optimal brochure pricing: applying advanced analytics in the tour operating industry. OR Spectrum 36:557–584
Bolduc D, Fortin B, Fournier M (1996) The impact of incentive policies on practice location of doctors: a multinomial probit analysis. J Labor Econ 14:703–732
Boztuğ Y, Hildebrandt L, Raman K (2014) Detecting price thresholds in choice models using a semi-parametric approach. OR Spectrum 36:187–207
Bunch D (1991) Estimability in the multinomial probit model. Transp Res B 25B:1–12
Crabbe M, Jones B, Vandebroek M (2013) Comparing two-stage segmentation methods for choice data with a one-stage latent class choice analysis. Commun Stat Simul Comput 42:1188–1212
Daganzo C (1979) Multinomial probit: the theory and its applications to demand forecasting. Academic Press, New York
Dansie B (1985) Parameter estimability in the multinomial probit model. Transp Res B 19B:526–528
DeSarbo W, Ramaswamy V, Cohen S (1995) Market segmentation with choice-based conjoint analysis. Mark Lett 6:137–147
Dhar R, Nowlis S, Sherman S (2000) Trying hard or hardly trying: an analysis of context effects in choice. J Consum Psychol 9:189–200
Elrod T, Keane M (1995) A factor analytic probit model for representing the market structure of panel data. J Mark Res 32:1–16
Fennell G, Allenby GM, Yang S, Edwards Y (2003) The effectiveness of demographic and psychographic variables for explaining brand and product category use. Quant Mark Econ 1:223–244
Greene W, Hensher D (2013) Revealing additional dimensions of preference heterogeneity in a latent class mixed multinomial logit model. Appl Econ 45:1897–1902
Grün B, Leisch F (2008) Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. J Classif 25:225–247
Gustafson A, Herrmann A, Huber F (2003) Conjoint analysis as an instrument of market research practice. In: Gustafson A, Herrmann A, Huber F (eds) Conjoint measurement: methods and applications, 3rd edn. Springer, Berlin, pp 5–46
Haaijer R, Wedel M, Vriens M, Wansbeek TJ (1998) Utility covariances and context effects in conjoint MNP models. Mark Sci 17:236–252
Haase K, Müller S (2015) Insights into clients’ choice in preventive health care facility location planning. OR Spectrum 37:273–291
Huber J, Arora N, Johnson RM (1998) Capturing heterogeneity in consumer choices. ART Forum. American Marketing Association, Chicago
Hausman J, Wise D (1978) A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46:403–429
Karniouchina EV, Moore WL, van der Rhee B, Verma R (2009) Issues in the use of ratings-based versus choice-based conjoint analysis in operations management research. Eur J Oper Res 197:340–348
Keane M (1992) A note on identification in the multinomial probit model. J Bus Econ Stat 10:193–200
Keane M, Wasi N (2013) Comparing alternative models of heterogeneity in consumer choice behavior. J Appl Econom 28:1018–1045
Löffler S, Baier D (2015) Bayesian conjoint analysis in water park pricing: a new approach taking varying part worths for attribute levels into account. J Serv Sci Manag 8:46–56
McCullough RP (2009) Comparing hierarchical Bayes and latent class choice: practical issues for sparse data sets. In: 2009 Sawtooth software conference proceedings, pp 273–284
McLachlan G, Krishnan T (2007) The EM algorithm and extensions. Wiley, Hoboken
Moore WL (2004) A cross-validity comparison of rating-based and choice-based conjoint analysis models. Int J Res Mark 21:299–312
Moore WL, Gray-Lee J, Louviere JJ (1998) A cross-validity comparison of conjoint analysis and choice models at different levels of aggregation. Mark Lett 9:195–208
Natter M, Feurstein M (2002) Real world performance of choice-based conjoint models. Eur J Oper Res 137:448–458
Otter T, Tüchler R, Frühwirth-Schnatter S (2004) Capturing consumer heterogeneity in metric conjoint analysis using Bayesian mixture models. Int J Res Mark 21:285–297
Özkan C, Karaesmen F, Özekici S (2015) A revenue management problem with a choice model of consumer behavior. OR Spectrum 37:457–473
Paetz F (2013) Finite Mixture Multinomiales Probitmodell: Konzeption und Umsetzung. Springer, Wiesbaden
Paetz F, Steiner W (2014) A finite mixture multinomial probit model for choice based conjoint analysis: a simulation study. In: Proceedings of the 43th EMAC conference, University of Valencia, Spain
Paetz F, Steiner W (2015) Die Berücksichtigung von Abhängigkeiten zwischen Alternativen in Finite Mixture Conjoint Choice Modellen: Eine Simulationsstudie. Mark ZFP 37:90–100
Raftery A (1995) Bayesian model selection in social research. Sociol Methodol 25:111–163
Rooderkerk R, Van Heerde H, Bijmolt T (2011) Incorporating context effects into a choice model. J Mark Res 48:767–780
Rossi P, Allenby G, McCulloch R (2005) Bayesian statistics in marketing. Wiley, New York
Sawtooth Software, Inc. (2013) The CBC system for choice-based conjoint analysis, Version 8, Sawtooth Software, Inc. http://www.sawtoothsoftware.com/download/techpap/cbctech.pdf. Accessed 28 Nov 2015
Simonson I, Tversky A (1992) Choice in context: tradeoff contrasts and extremeness aversion. J Mark Res 29:281–295
Steiner WJ (2010) A Stackelberg–Nash model for new product design. OR Spectrum 32:21–48
Steiner W, Hruschka H (2000) Conjointanalyse-basierte Produkt(linien)gestaltung unter Berücksichtigung von Konkurrenzreaktionen. OR Spektrum 22:71–95
Swait J (2003) Flexible covariance structures for categorical dependent variables through finite mixtures of generalized extreme value models. J Bus Econ Stat 21:80–87
Train K (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge
Train K (2008) Em algorithms for nonparametric estimation of mixing distributions. J Choice Model 1:40–69
Tuma M, Decker R (2013) Finite mixture models in market segmentation: a review and suggestions for best practices. Electron J Bus Res Methods 11:2–15
Vriens M, Wedel M, Wilms T (1996) Metric conjoint segmentation methods: a Monte Carlo comparison. J Mark Res 33:73–85
Wedel M, Kamakura W (2000) Market segmentation-conceptual and methodological foundations. Kluver Academic Publishers, Norwell
Wedel M, Kamakura W, Arora N, Bemmaor NJC, Elrod T, Johnson R, Lenk P, Neslin S, Poulson C (1999) Discrete and continuous representations of unobserved heterogeneity in choice modeling. Mark Lett 10:219–232
Weeks M (1997) The multinomial probit model revisited: a discussion of parameter estimability, identification and specification testing. J Econ Surv 11:297–320
Xu H, Craig B (2009) A probit latent class model with general correlation structures for evaluating accuracy of diagnostic tests. Biometrics 65:1145–1155
Yai T, Iwakura S, Morichi S (1997) Multinomial probit with structured covariance for route choice behavior. Transp Res B 31:195–207
Acknowledgements
We thank isi GmbH for collecting and providing the data for the empirical conjoint choice study and three anonymous reviewers for their comments.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Model identification
To ensure identification of the estimated models we proceeded as follows: First of all, we used 20 choice sets per respondent consisting of pairwise disjunct alternatives (except for the base alternative) to prevent respondent fatigue on the one hand, but to provide as much respondent information as possible for model estimation on the other hand. Eighteen choice sets were used for estimation, while two choice sets served as holdouts for validation. Having three real alternatives plus the base (none) alternative in each choice set, \(20 \cdot 3 +1 = 61\) pairwise different alternatives were displayed to a respondent over the whole choice task. Following Haaijer et al. (1998), the number of covariance parameters in the FM-MNCP model must not exceed the total number of alternatives as a necessary condition for identification. According to this requirement, the chosen study design would therefore enable the identification of the FM-MNCP model for up to five segments. More specifically, since with three attributes at four levels each a total of ten covariance parameters need to be estimated per segment (formally one less than the number of levels for each attribute plus one for the base option) the inequality \(18 \cdot 3+1 \ge 10 \cdot G\) holds for up to \(G=5\) segments. As sufficient condition for model identification Hesse matrices must further be checked for full rank and negative definiteness. We computed the Hesse matrices for the estimated models, respectively, with the result that Hessian matrices were negative definite in all cases. Therefore, since both the necessary and the sufficient conditions were satisfied, identification of the estimated models could be guaranteed.
For conjoint studies with a larger number of attributes and/or a larger number of latent segments, and therefore a larger number of covariance parameters to be estimated in the FM-MNCP model, the necessary condition for model identification seems more critical as it is. On the one hand, the researcher can increase the number of alternatives per choice set which lies typically between 3 and 5 (excluding the none option). This way, the number of different alternatives involved in the whole choice task can be increased. On the other hand, as it is common for the specification of interaction terms between attributes in practical applications, it may also be possible to limit the occurrence of context effects to specific pairs of attributes. In this case, the number of covariance parameters can be reduced. In addition, a recent meta-analysis on applications of finite mixture models indicates that commonly a moderate number of segments (on average three to four) is sufficient to describe empirical data adequately (Tuma and Decker 2013).
1.2 Model estimation
The EM algorithm consists of I iterations and incorporates a Gibbs sampling with R iterations in the E-step and the BFGS algorithm in the M-Step.
According to established stopping criteria for the EM algorithm, we terminate the EM algorithm in iteration i if changes in parameters are less than an a priori specified bound (cf. Wedel and Kamakura 2000, p. 88 or Train 2008, p. 44). Using several different starting points, every estimation was started with an initial vector \(\theta _0\), whose \(\Theta \) values are drawn component by component from the standardized normal distribution. Several pretests revealed an appropriate number of \(R=800\) iterations for the Gibbs sampling algorithm in the E-Step and a maximum number of 5000 iterations for the BFGS algorithm in the M-Step. We used the BFGS algorithm provided in the “optim”-function coded in R. The EM algorithm was stopped if none of the estimated parameters changed more than 0.03 over the last three iterations.
1.2.1 Estimation: E-step
The utility difference vectors \(z_k\) are \((M-1)\cdot P\)-variate Gaussian distributed. A closer examination reveals that this Gaussian distribution is truncated on \((-\infty , 0)^{(M-1)\cdot P}\). According to Xu and Craig (2009), p. 4, the E-step consists of a Gibbs sampling that cycles through the conditional distributions. Specifically, we run R iterations of the following two steps in the ith iteration of the EM algorithm:
-
1.
Let \(g_k\in \left\{ 1,\ldots ,G\right\} \) describe the membership of the kth choice pattern to segment g. \(g_k\) is multinomial distributed and can therefore be drawn from a multinomial distribution with the following G parameters: For \(g=1,\ldots ,G\)
$$\begin{aligned} p_{g_k=g}^{(ir)}=\frac{\pi ^{(i-1)}_{g^*} \cdot \phi _{(M-1)P}\left( z_k^{(i-1)}; A_k X \beta _{g^*}^{(i-1)}, A_k\varSigma _{g^*}^{(i-1)}A_k^\mathrm{T}\right) }{\sum \nolimits _{g=1}^G\pi ^{(i-1)}_{g} \cdot \phi _{(M-1)P}\left( z_k^{(i-1)}; A_k X \beta _{g}^{(i-1)}, A_k\varSigma _{g}^{(i-1)}A_k^\mathrm{T}\right) }, \end{aligned}$$(10)and \(\sum _{g=1}^{G} p_{g_k=g}^{(ir)} =1\) for given i and r.
-
2.
Based on the segment memberships \(g_k^{(ir)}\) resulting from (1.), the corresponding utility difference vectors \(z_k^{(ir)}\) can be determined. The difference vectors \(z_k^{(ir)}=(z_{k1}^{(ir)},\ldots , z_{k((M-1)P)}^{(ir)} )^\mathrm{T}\) are drawn component by component q, \(q=1,\ldots ,(M-1)\cdot P\), from an univariate truncated Gaussian distribution \(TN(\mu _q^{(ir)}, \sigma _q^{(ir)})\) on \((-\infty ,0)\), where the computation of \(\mu _q^{(ir)}\) and \(\sigma _q^{(ir)}\) follows the approach of Xu and Craig (2009):
$$\begin{aligned} \sigma _q^{(ir)}= \frac{1}{\left( A_k\varSigma _g^{(i-1)}A_k^\mathrm{T}\right) ^{-1}_{[q,q]}}, \end{aligned}$$(11)where [q, q] denotes the qth diagonal element of the \((M-1)\cdot P \times (M-1)\cdot P\) matrix \((A_k\varSigma _g^{(i-1)}A_k^\mathrm{T})^{-1}\) and
$$\begin{aligned} \mu _q^{(ir)}= \left( A_k X \beta _g^{(i-1)}\right) _{[q]}-\sigma _q^{(ir)}\left( A_k\varSigma _g^{(i-1)}A_k^\mathrm{T}\right) ^{-1}_{[q,-q]} (z_{k[-q]}^{(i-1)}-\left( A_k X \beta _g^{(i-1)})_{[-q]}\right) ,\!\!\!\!\!\nonumber \\ \end{aligned}$$(12)where [q] denotes the qth component of the vector, \([-q]\) the corresponding vector without the qth component, and \([q,-q]\) the qth row of the corresponding matrix without the qth element.
After R iterations of the Gibbs sampling procedure and under consideration of a burn-in phase of R / 2 iterations (also see Xu and Craig 2009), the conditional expectations of the segment memberships and the utility difference vectors in iteration i of the EM algorithm are calculated as follows (for all \(g=1,\ldots ,G\)):
Hence, the relative segment masses \(\pi _g^{(i)}\) can be derived as
1.2.2 Estimation: M-step
The conditional expectations \(E(z_k^{(i)}|g_k=g, y_k, \beta _g, \varSigma _g)\) of the utility difference vectors \(z_k\) and the relative segment masses \(\pi _g^{(i)}\) build the input for the M-step and are used as fixed quantities during the maximization of the log likelihood. Based on those quantities, the BFGS algorithm determines the segment-specific part-worth vectors \(\beta _{g}^{(i)}\) (and covariance vectors \(\sigma _g^{(i)}\)) in iteration i of the EM algorithm.
1.3 Simulation study
To gain further evidence on the performance of FM-MNCP versus FM-MNIP models, we conducted a simulation study and compared both models with respect to model fit, predictive validity, as well as parameter recovery. Parameter recovery as a further performance criterion can only be addressed with simulated data since the true parameters are known here (as opposed to empirical data). The simulation study manipulates commonly used experimental factors for finite mixture models, i.e., number of segments, separation between segments, and relative segment masses (cf. Vriens et al. 1996; Andrews et al. 2002b; Andrews and Currim 2003). In addition, we included the covariance structure (zero versus nonzero structure) as another experimental factor. A zero covariance structure implies full independence between alternatives and therefore coincides with the independence assumption of the FM-MNIP model.
The simulated data comprise individual choice patterns for 600 respondents as well as artificially generated “true” part-worth utilities and covariance vectors (the latter only for the FM-MNCP model). The data generation process leans on the approaches of Vriens et al. (1996) and Andrews and Currim (2003). For model comparison, we computed the model-specific means of several performance measures both per factor level and across factor levels. Model fit is measured by the log likelihood, while parameter recovery is assessed by the root mean square error RMSE\((\beta ) =(1/(G\cdot S) \sum _{g=1}^G \sum _{s=1}^S (\beta _{gs} - \widehat{\beta }_{gs})^2 )^{1/2}\) between the true segment-specific part-worths \(\beta _{g}\) and the estimated segment-specific part-worths \(\widehat{\beta }_{g}\). Predictive validity is evaluated by the root mean square error \(\hbox {RMSE}(V) =(1/(J\cdot W\cdot M) \sum _{j=1}^J \sum _{w=1}^W \sum _{m=1}^M(V_{jwm} - \widehat{V}_{jwm})^2 )^{1/2}\) between the true deterministic utilities of holdout alternatives \(V_{jwm}\) and the estimated ones \(\widehat{V}_{jwm}\) (cf. Andrews et al. 2002b; Andrews and Currim 2003), where W denotes the number of holdout choice sets consisting of M alternatives each. To be comparable to our empirical study, the choice tasks consisted of 18 choice sets for estimation and 2 holdouts for validation, 4 alternatives including the none option per choice set, and 3 attributes with four levels for designing the alternatives. With our four experimental factors at two factor levels each and one additional replication for each of the resulting 16 treatments, we obtained a total of 32 observations for each of the performance statistics and model type.
Table 8 displays the means of all three performance measures both at the individual factor level and overall across factor levels.
Across factor levels, the FM-MNCP model significantly (\(\alpha \le 5\%\)) outperforms the FM-MNIP model with respect to model fit (\(\hbox {LL}^\mathrm{MNCP}=-10{,}396.289>-10{,}458.533=\hbox {LL}^\mathrm{MNIP}\)) and parameter recovery (\(\hbox {RMSE}(\beta )^\mathrm{MNCP}=0.529 < 0.671= \hbox {RMSE}(\beta )^\mathrm{IP}\)) (see row “overall”). Concerning predictive accuracy the difference in the root mean square error statistic is not significant between models, although the FM-MNCP model shows a better (overall) \(\hbox {RMSE}(V)\) value (\(\hbox {RMSE}(V)^\mathrm{MNCP}=29.209 < 31.513 = \hbox {RMSE}(V)^\mathrm{IP}\)).
Considering the performance of the models at individual factor levels, the main results can be described as follows: For two segments, the FM-MNCP model provides a significantly better predictive accuracy than the FM-MNIP model (\(\alpha \le 5\%\)). Parameter recovery is always better under the FM-MNCP model, and differences to the FM-MNIP model in \(\hbox {RMSE}(\beta )\) are significant at all factor levels except for a zero covariance structure and equal segment masses (\(\alpha \le 10\%\)). For three segments as well as for a large separation of segments the FM-MNCP model significantly (\(\alpha \le 10\%\)) outperformed the FM-MNIP model with regard to model fit, too.
In contrast to our empirical study, the simulation study is based on comparisons of FM-MNCP and FM-MNIP models that were estimated for the same number of segments. For example, if the true number of segments was two, we estimated and compared the two-segment FM-MNCP solution to the two-segment FM-MNIP solution. This procedure is common practice in simulation studies (see, e.g., Andrews et al. 2002b). Obviously, the FM-MNIP model performs worse than the FM-MNCP model in many situations when the number of segments is held fixed a priori. Under those conditions, the FM-MNIP model seems not able to overcome its independence assumption by accounting for heterogeneity only. Probably, the differences between the two models may turn out still larger with a higher power of the design (i.e., when more replications under each treatment are used). For a discussion of the simulation study in full length, see Paetz and Steiner (2014) or Paetz and Steiner (2015).
1.4 Utility functions
Rights and permissions
About this article
Cite this article
Paetz, F., Steiner, W.J. The benefits of incorporating utility dependencies in finite mixture probit models. OR Spectrum 39, 793–819 (2017). https://doi.org/10.1007/s00291-017-0478-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00291-017-0478-y