Abstract
Bayesian multinomial probit models have been widely used to analyze discrete choice data. Existing methods have some shortcomings in parameter identification or sensitivity of posterior inference to labeling of choice objects. The main task of this study is to simultaneously deal with these problems. First we propose a globally and symmetrically identified multinomial probit model with covariance matrix positive semidefinite. However, it is difficult to design an efficient Bayesian algorithm to fit the model directly because it is infeasible to sample a positive semidefinite matrix from a regular distribution. Then we develop a projected model for the above proposed model by projection technique. This projected model is equivalent to the former one, but equips with a positive definite covariance matrix. Finally, based on the latter model, we develop an efficient Bayesian algorithm to fit it by using modern Markov chain Monte Carlo techniques. Through simulation studies and an analysis of clothes detergent purchases data, we demonstrated that our approach not only solved the identifiability problem, but also showed robustness and satisfactory estimation accuracy, while the computation costs were comparable.
Similar content being viewed by others
References
Albert, J., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993)
Anderson, S.P., de Palma, A., Thisse, J.F.: Discrete Choice Theory of Product Differentiation. MIT Press, Cambridge (1992)
Ben-Akiva, M., Lerman, S.R.: Discrete Choice Analysis: Theory and Application to Predict Travel Demand. MIT Press, Cambridge (1985)
Burgette, L., Nordheim, E.: The trace restriction: an alternative identification strategy for the Bayesian multinomial probit model. J. Bus. Econ. Stat. 30, 404–410 (2012)
Burgette, L., Puelz, D., Hahn, P.: A symmetric prior for multinomial probit models. Bayesian Anal. 16(3), 991–1008 (2021)
de Bekker-Grob, E.W., Ryan, M., Gerard, K.: Discrete choice experiments in health economics: a review of the literature. Health Econ. 21, 145–172 (2012)
Fong, D.K.H., Kim, S., Chen, Z., et al.: A Bayesian multinomial probit model for the analysis of panel choice data. Psychometrika 81, 161–183 (2016)
Hausman, J.A., Wise, D.A.: A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46, 403–426 (1978)
Hoff, P.D.: A First Course in Bayesian Statistical Methods. Springer Press, New York (2009)
Imai, K., van Dyk, D.: A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of Econometrics 124, 311–334 (2005a)
Imai, K., van Dyk, D.: MNP: R package for fitting the multinomial probit model. J. Stat. Softw. 14, 1–32 (2005b)
Keane, M.P.: A note on identification in the multinomial probit model. J. Bus. Econ. Stat. 10, 193–200 (1992)
Kruschke, J.K.: Bayesian estimation supersedes the t test. J. Exp. Psychol. Gen. 142(2), 573–603 (2013)
McCulloch, R., Rossi, P.: An exact likelihood analysis of the multinomial probit model. J. Econom. 64, 207–240 (1994)
McCulloch, R., Polson, N., Rossi, P.: A Bayesian analysis of the multinomial probit model with fully identified parameters. J. Econom. 99, 173–193 (2000)
Nobile, A.: A hybrid markov chain for the bayesian analysis of the multinomial probit model. Stat. Comput. 8, 229–242 (1998)
Quinn, K.M., Martin, A.D., Whitford, A.B.: Voter choice in multi-party democracies: a test of competing theories and models. Am. J. Political Sci. 43(4), 1231–1247 (1999)
Small, K.A., Rosen, H.S.: Applied welfare economics with discrete choice models. Econometrica 49, 105–130 (1981)
Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–540 (1987)
Yang, S., Allenby, G.M.: Modeling interdependent consumer preferences. J. Mark. Res. 40, 282–294 (2003)
Yu, P.L.H.: Bayesian analysis of order-statistics models for ranking data. Psychometrika 65, 281–299 (2000)
Acknowledgements
This research is partially supported by two grants from National Natural Science Foundation of China (11501287 and 71571096) and three grants from the Research Grants Council General Research Fund of the Hong Kong Special Administrative Region, China (14303819, 14203915 and 14173817).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Prediction bias regarding BPH’s symmetric MNP model
To start with, we recall the MNP model defined in Eq. (7),
where W, X and \(\beta ,\Sigma \) are of the same definition as in Sect. 3.1. As we discussed before, the parameters in the model (20) are not identified unless some restrictions are imposed on them. Previous studies usually fix the first diagonal entry \(\sigma _{11}\) of \(\Sigma \) to be some fixed positive value. For ease of explanation, here we first scale the model (20) with restriction \(\sigma _{11}=1\) and take it as the baseline model. Then for the model (20) scaled by other identification methods, denoted by
with \({\tilde{\sigma }}_{11}=\alpha ^{2}\) where \(\alpha \) is a fixed positive value, we have \({\tilde{\Sigma }}=\alpha ^{2}\Sigma \) and \({\tilde{\beta }}=\alpha \beta \). Furthermore, we have \({\tilde{W}}=\alpha W\) in distribution, and for \( j=1,2,\ldots , p\),
Next, we consider the BPH’s partial trace restriction on the model (20). Suppose the b-th diagonal entry of covariance matrix being picked out and restrict the sum of the remaining diagonal entries to \(p-1\), which yields the identified model as follows
where \(\quad tr(\Sigma ^{(b)}_{-b})=p-1\). As in the model (21), there exists \(\alpha _{b}^{2}=\frac{p-1}{tr(\Sigma _{-b})}\), such that \(\beta ^{(b)}=\alpha _{b}\beta , \Sigma ^{(b)}=\alpha _{b}^{2}\Sigma \) and \(W^{(b)}=\alpha _{b}W\) in distribution. The BPH’s symmetric MNP model averages the above models and obtains posterior mean estimate of coefficients vector and covariance matrix as follows,
where \(f_{j}\) denotes the posterior probability of the event {b=j}. Under BPH’s uniform prior on parameter \(b\in \{1,2,\ldots , p\}\), we have \(f_{j}>0\) for all j. By Jensen’s inequality,
the equality holds if and only if \(\alpha _{1}=\alpha _{2}=\cdots =\alpha _{p}\). For simplicity, denote
For given covariate matrix X, denote \(X\beta \) by \(\mu \) with elements \(\mu _{i}, i=1,2,\ldots ,p\). Then the latent random vector \(W^{*}\) defined by BPH’s symmetric MNP model follows the normal distribution
Take \(\alpha =\alpha _{\sigma }\), by the MNP model defined in Eq. (21), the latent random vector
Without loss of generality, we assume \(\mu _{1}\ge \mu _{k}, k=2,\ldots ,p\), at least one inequality holds. In addition, suppose not all diagonal elements of \(\Sigma \) are equal, which results in that not all \(\alpha _{i}, i=1,2,\ldots ,p\), are equal. Then we have \(\alpha _{\beta }<\alpha _{\sigma }\) and further
where \(P_{1}=\{2,3,\ldots ,p\}\). The third equality holds because \(W^{*}-\alpha _{\beta }\mu ={\tilde{W}}-\alpha _{\sigma }\mu \) in distribution. The last equality holds because \({\tilde{W}}=\alpha _{\sigma }W\) in distribution. The above inequality says that the probability of choosing the object with label 1 resulting from BPH’s symmetric MNP model will be smaller than that from baseline model. In other words, the BPH’s symmetric MNP model will distort the true choice probabilities unless all the diagonal entries of the true covariance matrix are equal.
Appendix 2: Prior for covariance matrix with trace augmented restriction
Suppose the \(p\times p\) matrix \({\tilde{A}}\sim \text {InvWishart}(\nu ,\Psi )\), where \(\Psi \) is a positive definite \(p\times p\) matrix, and \(\nu (\ge p)\) is the degree of freedom. Transform \({\tilde{A}}\) to \((\alpha ^{2}, A)\) as follows,
where I is the \(p\times p\) identity matrix, J is the \(p\times p\) matrix with all entries equal to 1. Let \(1\{\cdot \}\) be the indication function, then the joint distribution of \(\alpha ^{2}\) and A is
and the marginal distribution of A is
Proof
Set \({\tilde{A}}_{ex} ={\tilde{A}}(I+J)\), and make the following transformations
By the distribution assumption of \({\tilde{A}}\), we know
which induces
where \(\Psi _{ex}=\Psi (I+J)\). The last proportion holds because \(Jacobian_{1}\) is constant as regard to \({\tilde{A}}_{ex}\). Combining Eq. (7) of Burgette and Nordheim (2012) and Eq. (23), we have
Since \(\alpha ^{2}=\alpha _{ex}^{2}\), \(A=A_{ex}(I+J)^{-1}\) and the Jacobian of such transformation is constant, from Eq. (24) we have
By integrating (25) over \(\alpha ^{2}\), we have
\(\square \)
Appendix 3: Proof of equal posterior distributions
Posterior distributions of the same parameters under different projected GSI models are equal to each other.
Proof
Without loss of generality, we only consider the posterior distributions of \(\beta _{-1}\) and \(\Sigma _{-1}\). For all \(b\ge 2\),
where
\(\square \)
Appendix 4: Derivation of the posterior of b
Assume the prior of b is uniform on the set P={\(1,2,\ldots , p\)}, then the posterior of b given Y is also uniform on P.
Proof
where
Since \(W_{-p}=D_{p}W_{-1}\), we have
Further,
where \(\Sigma _{-p}=D_{p}\Sigma _{-1}D_{p}^{T}\) and \(\delta _{k,-p}=D_{p}\delta _{k,-1}, k=1,\ldots , q_{2}\), \(\delta _{k,-p}\) and \(\delta _{k,-1}\) are components of \(\beta _{-p}\) and \(\beta _{-1}\) respectively. Since the absolute values of the Jacobian \(J_{\Sigma _{-1}\rightarrow \Sigma _{-p}}\) and \(J_{\beta _{-1}\rightarrow \beta _{-p}}\) are both equal to 1, we have
By similar deduction, we get equalities as follows
Because Y is uniquely determined by \(W_{-b}\),
According to the Bayes’s rule,
since the prior of b is uniform. That is, the posterior of b given Y is uniform on P. \(\square \)
Appendix 5: Bayesian estimation of \(\theta \)
Suppose \(Y_{1}, Y_{2},\ldots ,Y_{n}\sim i.i.d.\) N\((\theta , \frac{1}{\phi })\). Then their joint density is given by
A conjugate prior distribution for \((\theta , \phi )\) is normal-gamma distribution. In detail,
We get the joint posterior distribution for \((\theta , \phi )\) as follows
where
and \({\bar{y}} =\frac{1}{n}\sum _{i=1}^{n}y_{i},~ SS = \sum _{i=1}^{n}(y_{i}-{\bar{y}})^{2}.\) As pointed out by Hoff (2009), the marginal posterior distribution of \(\theta \) has a t distribution, i.e.
In the simulation studies in Sects. 4.1 and 4.2, \(Y_{1},\ldots ,Y_{50}\) represent the paired differences of average total variations between two competing models. In all cases, we set \(\mu _{0}=0\), which means that we have no preference for either of the paired models in advance. And set \(\kappa _{0}=1, \nu _{0}=1\). In addition, in view of \(\textrm{E}\phi =\frac{\nu _{0}}{SS_{0}}\), we set \(SS_{0}\approx \frac{\nu _{0}}{{\hat{\phi }}}\), where \({\hat{\phi }}\) is the estimate of \(\phi \), such as taking the reciprocal of corresponding sample variance in our studies.
Figure 8 shows the posterior density plot of \(\theta \) in Sect. 4.1, in which the dashed line corresponds to the ATV differences of the BPH model subtracted by the ProJ1 model, and the solid line corresponds to the ATV differences of the AveGSI model subtracted by the ProJ1 model. Figure 9 shows the posterior density plot of \(\theta \) in Sect. 4.2, in which the dashed line corresponds to the ATV differences of the BN1 model subtracted by the AveGSI model, and the solid line corresponds to the ATV differences of the ProJ1 model subtracted by the AveGSI model. In Sect. 4.1, posterior density plots of \(\theta \) result from comparisons between BPH and GSIs resemble the dashed line in Fig. 8, and the others resemble the solid line. In Sect. 4.2, posterior density plots of \(\theta \) result from comparisons between BNs and AveGSI resemble the dashed line in Fig9, and the others resemble the solid line.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pan, M., Gu, M., Wu, X. et al. Globally and symmetrically identified Bayesian multinomial probit model. Stat Comput 33, 68 (2023). https://doi.org/10.1007/s11222-023-10232-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-023-10232-4