Skip to main content
Log in

The benefits of incorporating utility dependencies in finite mixture probit models

  • Regular Article
  • Published:
OR Spectrum Aims and scope Submit manuscript

Abstract

We propose an application of a new finite mixture multinomial conditional probit (FM-MNCP) model that accommodates preference heterogeneity and explicitly accounts for utility dependencies between choice alternatives considering both local and background contrast effects. The latter is accomplished by using a one-factor structure for segment-specific covariance matrices allowing for nonzero off-diagonal covariance elements. We compare the model to a finite mixture multinomial independent probit (FM-MNIP) model that as well accommodates heterogeneity but assumes independence. That way, we address the potential benefits of a model that additionally accounts for dependencies over a model that accommodates heterogeneity only. Our model comparison is based on empirical data for smoothies and is assessed in terms of fit, holdout validation, and market share predictions. One of the main findings of our empirical study is that allowing for utility dependencies may counterbalance the effects of considering heterogeneity, and vice versa. Additional findings from a simulation study indicate that the FM-MNCP model outperforms the FM-MNIP model with respect to parameter recovery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Andrews RL, Ainslie A, Currim IS (2002a) An empirical comparison of logit choice models with discrete versus continuous representations of heterogeneity. J Mark Res 39:479–487

    Article  Google Scholar 

  • Andrews R, Ansari A, Currim I (2002b) Hierarchical Bayes versus finite mixture conjoint analysis models: a comparison of fit, prediction, and partworth recovery. J Mark Res 39:87–98

    Article  Google Scholar 

  • Andrews R, Currim I (2003) A comparison of segment retention criteria for finite mixture logit models. J Mark Res 40:235–243

    Article  Google Scholar 

  • Baur A, Klein R, Steinhardt C (2014) Model-based decision support for optimal brochure pricing: applying advanced analytics in the tour operating industry. OR Spectrum 36:557–584

    Article  Google Scholar 

  • Bolduc D, Fortin B, Fournier M (1996) The impact of incentive policies on practice location of doctors: a multinomial probit analysis. J Labor Econ 14:703–732

    Article  Google Scholar 

  • Boztuğ Y, Hildebrandt L, Raman K (2014) Detecting price thresholds in choice models using a semi-parametric approach. OR Spectrum 36:187–207

  • Bunch D (1991) Estimability in the multinomial probit model. Transp Res B 25B:1–12

    Article  Google Scholar 

  • Crabbe M, Jones B, Vandebroek M (2013) Comparing two-stage segmentation methods for choice data with a one-stage latent class choice analysis. Commun Stat Simul Comput 42:1188–1212

    Article  Google Scholar 

  • Daganzo C (1979) Multinomial probit: the theory and its applications to demand forecasting. Academic Press, New York

    Google Scholar 

  • Dansie B (1985) Parameter estimability in the multinomial probit model. Transp Res B 19B:526–528

    Article  Google Scholar 

  • DeSarbo W, Ramaswamy V, Cohen S (1995) Market segmentation with choice-based conjoint analysis. Mark Lett 6:137–147

    Article  Google Scholar 

  • Dhar R, Nowlis S, Sherman S (2000) Trying hard or hardly trying: an analysis of context effects in choice. J Consum Psychol 9:189–200

    Article  Google Scholar 

  • Elrod T, Keane M (1995) A factor analytic probit model for representing the market structure of panel data. J Mark Res 32:1–16

    Article  Google Scholar 

  • Fennell G, Allenby GM, Yang S, Edwards Y (2003) The effectiveness of demographic and psychographic variables for explaining brand and product category use. Quant Mark Econ 1:223–244

    Article  Google Scholar 

  • Greene W, Hensher D (2013) Revealing additional dimensions of preference heterogeneity in a latent class mixed multinomial logit model. Appl Econ 45:1897–1902

    Article  Google Scholar 

  • Grün B, Leisch F (2008) Identifiability of finite mixtures of multinomial logit models with varying and fixed effects. J Classif 25:225–247

    Article  Google Scholar 

  • Gustafson A, Herrmann A, Huber F (2003) Conjoint analysis as an instrument of market research practice. In: Gustafson A, Herrmann A, Huber F (eds) Conjoint measurement: methods and applications, 3rd edn. Springer, Berlin, pp 5–46

    Chapter  Google Scholar 

  • Haaijer R, Wedel M, Vriens M, Wansbeek TJ (1998) Utility covariances and context effects in conjoint MNP models. Mark Sci 17:236–252

    Article  Google Scholar 

  • Haase K, Müller S (2015) Insights into clients’ choice in preventive health care facility location planning. OR Spectrum 37:273–291

    Article  Google Scholar 

  • Huber J, Arora N, Johnson RM (1998) Capturing heterogeneity in consumer choices. ART Forum. American Marketing Association, Chicago

  • Hausman J, Wise D (1978) A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46:403–429

    Article  Google Scholar 

  • Karniouchina EV, Moore WL, van der Rhee B, Verma R (2009) Issues in the use of ratings-based versus choice-based conjoint analysis in operations management research. Eur J Oper Res 197:340–348

    Article  Google Scholar 

  • Keane M (1992) A note on identification in the multinomial probit model. J Bus Econ Stat 10:193–200

    Google Scholar 

  • Keane M, Wasi N (2013) Comparing alternative models of heterogeneity in consumer choice behavior. J Appl Econom 28:1018–1045

    Google Scholar 

  • Löffler S, Baier D (2015) Bayesian conjoint analysis in water park pricing: a new approach taking varying part worths for attribute levels into account. J Serv Sci Manag 8:46–56

    Google Scholar 

  • McCullough RP (2009) Comparing hierarchical Bayes and latent class choice: practical issues for sparse data sets. In: 2009 Sawtooth software conference proceedings, pp 273–284

  • McLachlan G, Krishnan T (2007) The EM algorithm and extensions. Wiley, Hoboken

    Google Scholar 

  • Moore WL (2004) A cross-validity comparison of rating-based and choice-based conjoint analysis models. Int J Res Mark 21:299–312

    Article  Google Scholar 

  • Moore WL, Gray-Lee J, Louviere JJ (1998) A cross-validity comparison of conjoint analysis and choice models at different levels of aggregation. Mark Lett 9:195–208

    Article  Google Scholar 

  • Natter M, Feurstein M (2002) Real world performance of choice-based conjoint models. Eur J Oper Res 137:448–458

    Article  Google Scholar 

  • Otter T, Tüchler R, Frühwirth-Schnatter S (2004) Capturing consumer heterogeneity in metric conjoint analysis using Bayesian mixture models. Int J Res Mark 21:285–297

    Article  Google Scholar 

  • Özkan C, Karaesmen F, Özekici S (2015) A revenue management problem with a choice model of consumer behavior. OR Spectrum 37:457–473

    Article  Google Scholar 

  • Paetz F (2013) Finite Mixture Multinomiales Probitmodell: Konzeption und Umsetzung. Springer, Wiesbaden

    Book  Google Scholar 

  • Paetz F, Steiner W (2014) A finite mixture multinomial probit model for choice based conjoint analysis: a simulation study. In: Proceedings of the 43th EMAC conference, University of Valencia, Spain

  • Paetz F, Steiner W (2015) Die Berücksichtigung von Abhängigkeiten zwischen Alternativen in Finite Mixture Conjoint Choice Modellen: Eine Simulationsstudie. Mark ZFP 37:90–100

    Article  Google Scholar 

  • Raftery A (1995) Bayesian model selection in social research. Sociol Methodol 25:111–163

    Article  Google Scholar 

  • Rooderkerk R, Van Heerde H, Bijmolt T (2011) Incorporating context effects into a choice model. J Mark Res 48:767–780

    Article  Google Scholar 

  • Rossi P, Allenby G, McCulloch R (2005) Bayesian statistics in marketing. Wiley, New York

    Book  Google Scholar 

  • Sawtooth Software, Inc. (2013) The CBC system for choice-based conjoint analysis, Version 8, Sawtooth Software, Inc. http://www.sawtoothsoftware.com/download/techpap/cbctech.pdf. Accessed 28 Nov 2015

  • Simonson I, Tversky A (1992) Choice in context: tradeoff contrasts and extremeness aversion. J Mark Res 29:281–295

    Article  Google Scholar 

  • Steiner WJ (2010) A Stackelberg–Nash model for new product design. OR Spectrum 32:21–48

    Article  Google Scholar 

  • Steiner W, Hruschka H (2000) Conjointanalyse-basierte Produkt(linien)gestaltung unter Berücksichtigung von Konkurrenzreaktionen. OR Spektrum 22:71–95

    Article  Google Scholar 

  • Swait J (2003) Flexible covariance structures for categorical dependent variables through finite mixtures of generalized extreme value models. J Bus Econ Stat 21:80–87

    Article  Google Scholar 

  • Train K (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Train K (2008) Em algorithms for nonparametric estimation of mixing distributions. J Choice Model 1:40–69

    Article  Google Scholar 

  • Tuma M, Decker R (2013) Finite mixture models in market segmentation: a review and suggestions for best practices. Electron J Bus Res Methods 11:2–15

    Google Scholar 

  • Vriens M, Wedel M, Wilms T (1996) Metric conjoint segmentation methods: a Monte Carlo comparison. J Mark Res 33:73–85

    Article  Google Scholar 

  • Wedel M, Kamakura W (2000) Market segmentation-conceptual and methodological foundations. Kluver Academic Publishers, Norwell

    Book  Google Scholar 

  • Wedel M, Kamakura W, Arora N, Bemmaor NJC, Elrod T, Johnson R, Lenk P, Neslin S, Poulson C (1999) Discrete and continuous representations of unobserved heterogeneity in choice modeling. Mark Lett 10:219–232

    Article  Google Scholar 

  • Weeks M (1997) The multinomial probit model revisited: a discussion of parameter estimability, identification and specification testing. J Econ Surv 11:297–320

    Article  Google Scholar 

  • Xu H, Craig B (2009) A probit latent class model with general correlation structures for evaluating accuracy of diagnostic tests. Biometrics 65:1145–1155

    Article  Google Scholar 

  • Yai T, Iwakura S, Morichi S (1997) Multinomial probit with structured covariance for route choice behavior. Transp Res B 31:195–207

    Article  Google Scholar 

Download references

Acknowledgements

We thank isi GmbH for collecting and providing the data for the empirical conjoint choice study and three anonymous reviewers for their comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Friederike Paetz.

Appendix

Appendix

1.1 Model identification

To ensure identification of the estimated models we proceeded as follows: First of all, we used 20 choice sets per respondent consisting of pairwise disjunct alternatives (except for the base alternative) to prevent respondent fatigue on the one hand, but to provide as much respondent information as possible for model estimation on the other hand. Eighteen choice sets were used for estimation, while two choice sets served as holdouts for validation. Having three real alternatives plus the base (none) alternative in each choice set, \(20 \cdot 3 +1 = 61\) pairwise different alternatives were displayed to a respondent over the whole choice task. Following Haaijer et al. (1998), the number of covariance parameters in the FM-MNCP model must not exceed the total number of alternatives as a necessary condition for identification. According to this requirement, the chosen study design would therefore enable the identification of the FM-MNCP model for up to five segments. More specifically, since with three attributes at four levels each a total of ten covariance parameters need to be estimated per segment (formally one less than the number of levels for each attribute plus one for the base option) the inequality \(18 \cdot 3+1 \ge 10 \cdot G\) holds for up to \(G=5\) segments. As sufficient condition for model identification Hesse matrices must further be checked for full rank and negative definiteness. We computed the Hesse matrices for the estimated models, respectively, with the result that Hessian matrices were negative definite in all cases. Therefore, since both the necessary and the sufficient conditions were satisfied, identification of the estimated models could be guaranteed.

For conjoint studies with a larger number of attributes and/or a larger number of latent segments, and therefore a larger number of covariance parameters to be estimated in the FM-MNCP model, the necessary condition for model identification seems more critical as it is. On the one hand, the researcher can increase the number of alternatives per choice set which lies typically between 3 and 5 (excluding the none option). This way, the number of different alternatives involved in the whole choice task can be increased. On the other hand, as it is common for the specification of interaction terms between attributes in practical applications, it may also be possible to limit the occurrence of context effects to specific pairs of attributes. In this case, the number of covariance parameters can be reduced. In addition, a recent meta-analysis on applications of finite mixture models indicates that commonly a moderate number of segments (on average three to four) is sufficient to describe empirical data adequately (Tuma and Decker 2013).

1.2 Model estimation

The EM algorithm consists of I iterations and incorporates a Gibbs sampling with R iterations in the E-step and the BFGS algorithm in the M-Step.

According to established stopping criteria for the EM algorithm, we terminate the EM algorithm in iteration i if changes in parameters are less than an a priori specified bound (cf. Wedel and Kamakura 2000, p. 88 or Train 2008, p. 44). Using several different starting points, every estimation was started with an initial vector \(\theta _0\), whose \(\Theta \) values are drawn component by component from the standardized normal distribution. Several pretests revealed an appropriate number of \(R=800\) iterations for the Gibbs sampling algorithm in the E-Step and a maximum number of 5000 iterations for the BFGS algorithm in the M-Step. We used the BFGS algorithm provided in the “optim”-function coded in R. The EM algorithm was stopped if none of the estimated parameters changed more than 0.03 over the last three iterations.

1.2.1 Estimation: E-step

The utility difference vectors \(z_k\) are \((M-1)\cdot P\)-variate Gaussian distributed. A closer examination reveals that this Gaussian distribution is truncated on \((-\infty , 0)^{(M-1)\cdot P}\). According to Xu and Craig (2009), p. 4, the E-step consists of a Gibbs sampling that cycles through the conditional distributions. Specifically, we run R iterations of the following two steps in the ith iteration of the EM algorithm:

  1. 1.

    Let \(g_k\in \left\{ 1,\ldots ,G\right\} \) describe the membership of the kth choice pattern to segment g. \(g_k\) is multinomial distributed and can therefore be drawn from a multinomial distribution with the following G parameters: For \(g=1,\ldots ,G\)

    $$\begin{aligned} p_{g_k=g}^{(ir)}=\frac{\pi ^{(i-1)}_{g^*} \cdot \phi _{(M-1)P}\left( z_k^{(i-1)}; A_k X \beta _{g^*}^{(i-1)}, A_k\varSigma _{g^*}^{(i-1)}A_k^\mathrm{T}\right) }{\sum \nolimits _{g=1}^G\pi ^{(i-1)}_{g} \cdot \phi _{(M-1)P}\left( z_k^{(i-1)}; A_k X \beta _{g}^{(i-1)}, A_k\varSigma _{g}^{(i-1)}A_k^\mathrm{T}\right) }, \end{aligned}$$
    (10)

    and \(\sum _{g=1}^{G} p_{g_k=g}^{(ir)} =1\) for given i and r.

  2. 2.

    Based on the segment memberships \(g_k^{(ir)}\) resulting from (1.), the corresponding utility difference vectors \(z_k^{(ir)}\) can be determined. The difference vectors \(z_k^{(ir)}=(z_{k1}^{(ir)},\ldots , z_{k((M-1)P)}^{(ir)} )^\mathrm{T}\) are drawn component by component q, \(q=1,\ldots ,(M-1)\cdot P\), from an univariate truncated Gaussian distribution \(TN(\mu _q^{(ir)}, \sigma _q^{(ir)})\) on \((-\infty ,0)\), where the computation of \(\mu _q^{(ir)}\) and \(\sigma _q^{(ir)}\) follows the approach of Xu and Craig (2009):

    $$\begin{aligned} \sigma _q^{(ir)}= \frac{1}{\left( A_k\varSigma _g^{(i-1)}A_k^\mathrm{T}\right) ^{-1}_{[q,q]}}, \end{aligned}$$
    (11)

    where [qq] denotes the qth diagonal element of the \((M-1)\cdot P \times (M-1)\cdot P\) matrix \((A_k\varSigma _g^{(i-1)}A_k^\mathrm{T})^{-1}\) and

    $$\begin{aligned} \mu _q^{(ir)}= \left( A_k X \beta _g^{(i-1)}\right) _{[q]}-\sigma _q^{(ir)}\left( A_k\varSigma _g^{(i-1)}A_k^\mathrm{T}\right) ^{-1}_{[q,-q]} (z_{k[-q]}^{(i-1)}-\left( A_k X \beta _g^{(i-1)})_{[-q]}\right) ,\!\!\!\!\!\nonumber \\ \end{aligned}$$
    (12)

    where [q] denotes the qth component of the vector, \([-q]\) the corresponding vector without the qth component, and \([q,-q]\) the qth row of the corresponding matrix without the qth element.

After R iterations of the Gibbs sampling procedure and under consideration of a burn-in phase of R / 2 iterations (also see Xu and Craig 2009), the conditional expectations of the segment memberships and the utility difference vectors in iteration i of the EM algorithm are calculated as follows (for all \(g=1,\ldots ,G\)):

$$\begin{aligned} E\left( g_k^{(i)} = g|z_k, y_k, \beta _g, \varSigma _g\right)= & {} \frac{2}{R} \cdot \sum \limits _{r=R/2}^R g_k^{(ir)}\ and \nonumber \\ E\left( z_k^{(i)}|g_k = g, y_k, \beta _g, \varSigma _g\right)= & {} \frac{2}{R} \cdot \sum \limits _{r=R/2}^R z_k^{(ir)} \end{aligned}$$
(13)

Hence, the relative segment masses \(\pi _g^{(i)}\) can be derived as

$$\begin{aligned} \pi _g^{(i)}= \frac{1}{J} \cdot \sum \limits _{k=1}^{M^P}n_k\cdot E\left( g_k^{(i)}=g|z_k, y_k, \beta _g, \varSigma _g\right) . \end{aligned}$$
(14)

1.2.2 Estimation: M-step

The conditional expectations \(E(z_k^{(i)}|g_k=g, y_k, \beta _g, \varSigma _g)\) of the utility difference vectors \(z_k\) and the relative segment masses \(\pi _g^{(i)}\) build the input for the M-step and are used as fixed quantities during the maximization of the log likelihood. Based on those quantities, the BFGS algorithm determines the segment-specific part-worth vectors \(\beta _{g}^{(i)}\) (and covariance vectors \(\sigma _g^{(i)}\)) in iteration i of the EM algorithm.

1.3 Simulation study

To gain further evidence on the performance of FM-MNCP versus FM-MNIP models, we conducted a simulation study and compared both models with respect to model fit, predictive validity, as well as parameter recovery. Parameter recovery as a further performance criterion can only be addressed with simulated data since the true parameters are known here (as opposed to empirical data). The simulation study manipulates commonly used experimental factors for finite mixture models, i.e., number of segments, separation between segments, and relative segment masses (cf. Vriens et al. 1996; Andrews et al. 2002b; Andrews and Currim 2003). In addition, we included the covariance structure (zero versus nonzero structure) as another experimental factor. A zero covariance structure implies full independence between alternatives and therefore coincides with the independence assumption of the FM-MNIP model.

The simulated data comprise individual choice patterns for 600 respondents as well as artificially generated “true” part-worth utilities and covariance vectors (the latter only for the FM-MNCP model). The data generation process leans on the approaches of Vriens et al. (1996) and Andrews and Currim (2003). For model comparison, we computed the model-specific means of several performance measures both per factor level and across factor levels. Model fit is measured by the log likelihood, while parameter recovery is assessed by the root mean square error RMSE\((\beta ) =(1/(G\cdot S) \sum _{g=1}^G \sum _{s=1}^S (\beta _{gs} - \widehat{\beta }_{gs})^2 )^{1/2}\) between the true segment-specific part-worths \(\beta _{g}\) and the estimated segment-specific part-worths \(\widehat{\beta }_{g}\). Predictive validity is evaluated by the root mean square error \(\hbox {RMSE}(V) =(1/(J\cdot W\cdot M) \sum _{j=1}^J \sum _{w=1}^W \sum _{m=1}^M(V_{jwm} - \widehat{V}_{jwm})^2 )^{1/2}\) between the true deterministic utilities of holdout alternatives \(V_{jwm}\) and the estimated ones \(\widehat{V}_{jwm}\) (cf. Andrews et al. 2002b; Andrews and Currim 2003), where W denotes the number of holdout choice sets consisting of M alternatives each. To be comparable to our empirical study, the choice tasks consisted of 18 choice sets for estimation and 2 holdouts for validation, 4 alternatives including the none option per choice set, and 3 attributes with four levels for designing the alternatives. With our four experimental factors at two factor levels each and one additional replication for each of the resulting 16 treatments, we obtained a total of 32 observations for each of the performance statistics and model type.

Table 8 displays the means of all three performance measures both at the individual factor level and overall across factor levels.

Table 8 Simulation study: model-specific means of performance measures for model fit, parameter recovery, and predictive validity

Across factor levels, the FM-MNCP model significantly (\(\alpha \le 5\%\)) outperforms the FM-MNIP model with respect to model fit (\(\hbox {LL}^\mathrm{MNCP}=-10{,}396.289>-10{,}458.533=\hbox {LL}^\mathrm{MNIP}\)) and parameter recovery (\(\hbox {RMSE}(\beta )^\mathrm{MNCP}=0.529 < 0.671= \hbox {RMSE}(\beta )^\mathrm{IP}\)) (see row “overall”). Concerning predictive accuracy the difference in the root mean square error statistic is not significant between models, although the FM-MNCP model shows a better (overall) \(\hbox {RMSE}(V)\) value (\(\hbox {RMSE}(V)^\mathrm{MNCP}=29.209 < 31.513 = \hbox {RMSE}(V)^\mathrm{IP}\)).

Considering the performance of the models at individual factor levels, the main results can be described as follows: For two segments, the FM-MNCP model provides a significantly better predictive accuracy than the FM-MNIP model (\(\alpha \le 5\%\)). Parameter recovery is always better under the FM-MNCP model, and differences to the FM-MNIP model in \(\hbox {RMSE}(\beta )\) are significant at all factor levels except for a zero covariance structure and equal segment masses (\(\alpha \le 10\%\)). For three segments as well as for a large separation of segments the FM-MNCP model significantly (\(\alpha \le 10\%\)) outperformed the FM-MNIP model with regard to model fit, too.

In contrast to our empirical study, the simulation study is based on comparisons of FM-MNCP and FM-MNIP models that were estimated for the same number of segments. For example, if the true number of segments was two, we estimated and compared the two-segment FM-MNCP solution to the two-segment FM-MNIP solution. This procedure is common practice in simulation studies (see, e.g., Andrews et al. 2002b). Obviously, the FM-MNIP model performs worse than the FM-MNCP model in many situations when the number of segments is held fixed a priori. Under those conditions, the FM-MNIP model seems not able to overcome its independence assumption by accounting for heterogeneity only. Probably, the differences between the two models may turn out still larger with a higher power of the design (i.e., when more replications under each treatment are used). For a discussion of the simulation study in full length, see Paetz and Steiner (2014) or Paetz and Steiner (2015).

1.4 Utility functions

Fig. 1
figure 1

Segment-specific utility functions for the one- and two-segment solutions of the FM-MNCP and FM-MNIP models

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paetz, F., Steiner, W.J. The benefits of incorporating utility dependencies in finite mixture probit models. OR Spectrum 39, 793–819 (2017). https://doi.org/10.1007/s00291-017-0478-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00291-017-0478-y

Keywords

Navigation