Skip to main content
Log in

Globally and symmetrically identified Bayesian multinomial probit model

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Bayesian multinomial probit models have been widely used to analyze discrete choice data. Existing methods have some shortcomings in parameter identification or sensitivity of posterior inference to labeling of choice objects. The main task of this study is to simultaneously deal with these problems. First we propose a globally and symmetrically identified multinomial probit model with covariance matrix positive semidefinite. However, it is difficult to design an efficient Bayesian algorithm to fit the model directly because it is infeasible to sample a positive semidefinite matrix from a regular distribution. Then we develop a projected model for the above proposed model by projection technique. This projected model is equivalent to the former one, but equips with a positive definite covariance matrix. Finally, based on the latter model, we develop an efficient Bayesian algorithm to fit it by using modern Markov chain Monte Carlo techniques. Through simulation studies and an analysis of clothes detergent purchases data, we demonstrated that our approach not only solved the identifiability problem, but also showed robustness and satisfactory estimation accuracy, while the computation costs were comparable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Albert, J., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993)

  • Anderson, S.P., de Palma, A., Thisse, J.F.: Discrete Choice Theory of Product Differentiation. MIT Press, Cambridge (1992)

  • Ben-Akiva, M., Lerman, S.R.: Discrete Choice Analysis: Theory and Application to Predict Travel Demand. MIT Press, Cambridge (1985)

    Google Scholar 

  • Burgette, L., Nordheim, E.: The trace restriction: an alternative identification strategy for the Bayesian multinomial probit model. J. Bus. Econ. Stat. 30, 404–410 (2012)

    Article  MathSciNet  Google Scholar 

  • Burgette, L., Puelz, D., Hahn, P.: A symmetric prior for multinomial probit models. Bayesian Anal. 16(3), 991–1008 (2021)

    Article  MathSciNet  Google Scholar 

  • de Bekker-Grob, E.W., Ryan, M., Gerard, K.: Discrete choice experiments in health economics: a review of the literature. Health Econ. 21, 145–172 (2012)

    Article  Google Scholar 

  • Fong, D.K.H., Kim, S., Chen, Z., et al.: A Bayesian multinomial probit model for the analysis of panel choice data. Psychometrika 81, 161–183 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Hausman, J.A., Wise, D.A.: A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46, 403–426 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Hoff, P.D.: A First Course in Bayesian Statistical Methods. Springer Press, New York (2009)

    Book  MATH  Google Scholar 

  • Imai, K., van Dyk, D.: A Bayesian analysis of the multinomial probit model using marginal data augmentation. Journal of Econometrics 124, 311–334 (2005a)

    Article  MathSciNet  MATH  Google Scholar 

  • Imai, K., van Dyk, D.: MNP: R package for fitting the multinomial probit model. J. Stat. Softw. 14, 1–32 (2005b)

    Article  Google Scholar 

  • Keane, M.P.: A note on identification in the multinomial probit model. J. Bus. Econ. Stat. 10, 193–200 (1992)

    Google Scholar 

  • Kruschke, J.K.: Bayesian estimation supersedes the t test. J. Exp. Psychol. Gen. 142(2), 573–603 (2013)

    Article  Google Scholar 

  • McCulloch, R., Rossi, P.: An exact likelihood analysis of the multinomial probit model. J. Econom. 64, 207–240 (1994)

    Article  MathSciNet  Google Scholar 

  • McCulloch, R., Polson, N., Rossi, P.: A Bayesian analysis of the multinomial probit model with fully identified parameters. J. Econom. 99, 173–193 (2000)

    Article  MATH  Google Scholar 

  • Nobile, A.: A hybrid markov chain for the bayesian analysis of the multinomial probit model. Stat. Comput. 8, 229–242 (1998)

    Article  Google Scholar 

  • Quinn, K.M., Martin, A.D., Whitford, A.B.: Voter choice in multi-party democracies: a test of competing theories and models. Am. J. Political Sci. 43(4), 1231–1247 (1999)

    Article  Google Scholar 

  • Small, K.A., Rosen, H.S.: Applied welfare economics with discrete choice models. Econometrica 49, 105–130 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • Tanner, M., Wong, W.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–540 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Yang, S., Allenby, G.M.: Modeling interdependent consumer preferences. J. Mark. Res. 40, 282–294 (2003)

    Article  Google Scholar 

  • Yu, P.L.H.: Bayesian analysis of order-statistics models for ranking data. Psychometrika 65, 281–299 (2000)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This research is partially supported by two grants from National Natural Science Foundation of China (11501287 and 71571096) and three grants from the Research Grants Council General Research Fund of the Hong Kong Special Administrative Region, China (14303819, 14203915 and 14173817).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodan Fan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Prediction bias regarding BPH’s symmetric MNP model

To start with, we recall the MNP model defined in Eq. (7),

$$\begin{aligned} W=X\beta + \epsilon ,\quad \epsilon \sim \text {N}(0, \Sigma ), \end{aligned}$$
(20)

where WX and \(\beta ,\Sigma \) are of the same definition as in Sect. 3.1. As we discussed before, the parameters in the model (20) are not identified unless some restrictions are imposed on them. Previous studies usually fix the first diagonal entry \(\sigma _{11}\) of \(\Sigma \) to be some fixed positive value. For ease of explanation, here we first scale the model (20) with restriction \(\sigma _{11}=1\) and take it as the baseline model. Then for the model (20) scaled by other identification methods, denoted by

$$\begin{aligned} {\tilde{W}}=X{\tilde{\beta }} + {\tilde{\epsilon }},\quad {\tilde{\epsilon }}\sim \text {N}(0, {\tilde{\Sigma }} ), \end{aligned}$$
(21)

with \({\tilde{\sigma }}_{11}=\alpha ^{2}\) where \(\alpha \) is a fixed positive value, we have \({\tilde{\Sigma }}=\alpha ^{2}\Sigma \) and \({\tilde{\beta }}=\alpha \beta \). Furthermore, we have \({\tilde{W}}=\alpha W\) in distribution, and for \( j=1,2,\ldots , p\),

$$\begin{aligned} P({\tilde{w}}_{j}>\max _{k\ne j}{\tilde{w}}_{k})=P(w_{j}>\max _{k\ne j}w_{k}). \end{aligned}$$

Next, we consider the BPH’s partial trace restriction on the model (20). Suppose the b-th diagonal entry of covariance matrix being picked out and restrict the sum of the remaining diagonal entries to \(p-1\), which yields the identified model as follows

$$\begin{aligned} W^{(b)}=X\beta ^{(b)} + \epsilon ^{(b)},\quad \epsilon ^{(b)}\sim \text {N}(0, \Sigma ^{(b)} ), \end{aligned}$$
(22)

where \(\quad tr(\Sigma ^{(b)}_{-b})=p-1\). As in the model (21), there exists \(\alpha _{b}^{2}=\frac{p-1}{tr(\Sigma _{-b})}\), such that \(\beta ^{(b)}=\alpha _{b}\beta , \Sigma ^{(b)}=\alpha _{b}^{2}\Sigma \) and \(W^{(b)}=\alpha _{b}W\) in distribution. The BPH’s symmetric MNP model averages the above models and obtains posterior mean estimate of coefficients vector and covariance matrix as follows,

$$\begin{aligned} \beta ^{*}=&f_{1}\beta ^{(1)}+f_{2}\beta ^{(2)}+\cdots +f_{p}\beta ^{(p)}\\=&(f_{1}\alpha _{1}+f_{2}\alpha _{2}+\cdots +f_{p}\alpha _{p})\beta ,\\ \Sigma ^{*}=&f_{1}\Sigma ^{(1)}+f_{2}\Sigma ^{(2)}+\cdots +f_{p}\Sigma ^{(p)}\\=&(f_{1}\alpha _{1}^{2}+f_{2}\alpha _{2}^{2}+\cdots +f_{p}\alpha _{p}^{2})\Sigma , \end{aligned}$$

where \(f_{j}\) denotes the posterior probability of the event {b=j}. Under BPH’s uniform prior on parameter \(b\in \{1,2,\ldots , p\}\), we have \(f_{j}>0\) for all j. By Jensen’s inequality,

$$\begin{aligned}{} & {} (f_{1}\alpha _{1}+f_{2}\alpha _{2}+\cdots +f_{p}\alpha _{p})^{2}\\{} & {} \qquad \le f_{1}\alpha _{1}^{2}+f_{2}\alpha _{2}^{2}+\cdots +f_{p}\alpha _{p}^{2}, \end{aligned}$$

the equality holds if and only if \(\alpha _{1}=\alpha _{2}=\cdots =\alpha _{p}\). For simplicity, denote

$$\begin{aligned} \alpha _{\beta }=&f_{1}\alpha _{1}+f_{2}\alpha _{2}+\cdots +f_{p}\alpha _{p},\\ \alpha _{\sigma }=&\sqrt{f_{1}\alpha _{1}^{2}+f_{2}\alpha _{2}^{2}+\cdots +f_{p}\alpha _{p}^{2}}. \end{aligned}$$

For given covariate matrix X, denote \(X\beta \) by \(\mu \) with elements \(\mu _{i}, i=1,2,\ldots ,p\). Then the latent random vector \(W^{*}\) defined by BPH’s symmetric MNP model follows the normal distribution

$$\begin{aligned} W^{*}=N(\alpha _{\beta }\mu , \alpha _{\sigma }^{2}\Sigma ). \end{aligned}$$

Take \(\alpha =\alpha _{\sigma }\), by the MNP model defined in Eq. (21), the latent random vector

$$\begin{aligned} {\tilde{W}}\sim N(\alpha _{\sigma }\mu , \alpha _{\sigma }^{2}\Sigma ). \end{aligned}$$

Without loss of generality, we assume \(\mu _{1}\ge \mu _{k}, k=2,\ldots ,p\), at least one inequality holds. In addition, suppose not all diagonal elements of \(\Sigma \) are equal, which results in that not all \(\alpha _{i}, i=1,2,\ldots ,p\), are equal. Then we have \(\alpha _{\beta }<\alpha _{\sigma }\) and further

$$\begin{aligned}&P(w_{1}^{*}>w_{k}^{*}, k\in P_{1})\\ =&P(w_{1}^{*}-\alpha _{\beta }\mu _{1}>w_{k}^{*}-\alpha _{\beta }\mu _{1}, k\in P_{1})\\ =&P(w_{1}^{*}-\alpha _{\beta }\mu _{1}>w_{k}^{*}-\alpha _{\beta }\mu _{k}+\alpha _{\beta }(\mu _{k}-\mu _{1}), k\in P_{1})\\ =&P({\tilde{w}}_{1}-\alpha _{\sigma }\mu _{1}>{\tilde{w}}_{k}-\alpha _{\sigma }\mu _{k}+\alpha _{\beta }(\mu _{k}-\mu _{1}), k\in P_{1})\\ <&P({\tilde{w}}_{1}-\alpha _{\sigma }\mu _{1}>{\tilde{w}}_{k}-\alpha _{\sigma }\mu _{k}+\alpha _{\sigma }(\mu _{k}-\mu _{1}), k\in P_{1})\\ =&P({\tilde{w}}_{1}>{\tilde{w}}_{k}, k\in P_{1})\\ =&P(w_{1}>w_{k}, k\in P_{1}),\\ \end{aligned}$$

where \(P_{1}=\{2,3,\ldots ,p\}\). The third equality holds because \(W^{*}-\alpha _{\beta }\mu ={\tilde{W}}-\alpha _{\sigma }\mu \) in distribution. The last equality holds because \({\tilde{W}}=\alpha _{\sigma }W\) in distribution. The above inequality says that the probability of choosing the object with label 1 resulting from BPH’s symmetric MNP model will be smaller than that from baseline model. In other words, the BPH’s symmetric MNP model will distort the true choice probabilities unless all the diagonal entries of the true covariance matrix are equal.

Appendix 2: Prior for covariance matrix with trace augmented restriction

Suppose the \(p\times p\) matrix \({\tilde{A}}\sim \text {InvWishart}(\nu ,\Psi )\), where \(\Psi \) is a positive definite \(p\times p\) matrix, and \(\nu (\ge p)\) is the degree of freedom. Transform \({\tilde{A}}\) to \((\alpha ^{2}, A)\) as follows,

$$\begin{aligned} \alpha ^{2}=\frac{\text {tr}({\tilde{A}}(I+J))}{p-1}, \quad A=\frac{{\tilde{A}}}{\alpha ^{2}}, \end{aligned}$$

where I is the \(p\times p\) identity matrix, J is the \(p\times p\) matrix with all entries equal to 1. Let \(1\{\cdot \}\) be the indication function, then the joint distribution of \(\alpha ^{2}\) and A is

$$\begin{aligned} p(\alpha ^2, A)\propto&\mid A\mid ^{\frac{v+p+1}{2}} \exp \{-\frac{\text {tr}(\Psi A^{-1})}{2\alpha ^2}\} (\alpha ^2) ^{-\frac{vp+2}{2}}\\ {}&\cdot 1\{\text {tr}(A(I+J))=p-1\}, \end{aligned}$$

and the marginal distribution of A is

$$\begin{aligned} p(A)\propto&\mid A\mid ^{-\frac{v+p+1}{2}}\cdot [\text {tr}(\Psi A^{-1})]^{-\frac{vp}{2}}\\ {}&\cdot 1\{\text {tr}(A(I+J))=p-1\}. \end{aligned}$$

Proof

Set \({\tilde{A}}_{ex} ={\tilde{A}}(I+J)\), and make the following transformations

$$\begin{aligned} \alpha _{ex}^{2}=\frac{\text {tr}({\tilde{A}}_{ex})}{p-1}, \quad A_{ex}=\frac{{\tilde{A}}_{ex}}{\alpha _{ex}^{2}}. \end{aligned}$$
(23)

By the distribution assumption of \({\tilde{A}}\), we know

$$\begin{aligned} p({\tilde{A}})\propto \mid {\tilde{A}}\mid ^{-\frac{v+p+1}{2}} \exp \{-\frac{\text {tr}(\Psi {\tilde{A}}^{-1})}{2}\}, \end{aligned}$$

which induces

$$\begin{aligned} p({\tilde{A}}_{ex})&\propto \mid {\tilde{A}}_{ex}\mid ^{-\frac{v+p+1}{2}} \exp \{-\frac{\text {tr}(\Psi _{ex} {\tilde{A}}_{ex}^{-1})}{2}\}\cdot Jacobian_{1}\\&\propto \mid {\tilde{A}}_{ex}\mid ^{-\frac{v+p+1}{2}} \exp \{-\frac{\text {tr}(\Psi _{ex} {\tilde{A}}_{ex}^{-1})}{2}\}, \end{aligned}$$

where \(\Psi _{ex}=\Psi (I+J)\). The last proportion holds because \(Jacobian_{1}\) is constant as regard to \({\tilde{A}}_{ex}\). Combining Eq. (7) of Burgette and Nordheim (2012) and Eq. (23), we have

$$\begin{aligned} p(\alpha _{ex}^{2}, A_{ex})\propto&\mid A_{ex}\mid ^{-\frac{v+p+1}{2}} \exp \{-\frac{\text {tr}(\Psi _{ex} A_{ex}^{-1})}{2\alpha _{ex}^2}\} \nonumber \\ {}&\cdot (\alpha _{ex}^2)^{-\frac{vp+2}{2}}\cdot 1\{\text {tr}(A_{ex})=p-1\}. \end{aligned}$$
(24)

Since \(\alpha ^{2}=\alpha _{ex}^{2}\), \(A=A_{ex}(I+J)^{-1}\) and the Jacobian of such transformation is constant, from Eq. (24) we have

$$\begin{aligned} p(\alpha ^{2}, A)\propto&\mid A\mid ^{-\frac{v+p+1}{2}} \exp \{-\frac{\text {tr}(\Psi A^{-1})}{2\alpha ^2}\} (\alpha ^2)^{-\frac{vp+2}{2}}\nonumber \\ {}&\cdot 1\{\text {tr}(A(I+J))=p-1\}. \end{aligned}$$
(25)

By integrating (25) over \(\alpha ^{2}\), we have

$$\begin{aligned} p(A)\propto&\mid A\mid ^{-\frac{v+p+1}{2}}\cdot [\text {tr}(\Psi A^{-1})]^{-\frac{vp}{2}} \\ {}&\cdot 1\{\text {tr}(A(I+J))=p-1\}. \end{aligned}$$

\(\square \)

Appendix 3: Proof of equal posterior distributions

Posterior distributions of the same parameters under different projected GSI models are equal to each other.

Proof

Without loss of generality, we only consider the posterior distributions of \(\beta _{-1}\) and \(\Sigma _{-1}\). For all \(b\ge 2\),

$$\begin{aligned}&p_{b}(\beta _{-b}, \Sigma _{-b}\mid ProJ_{b},W)\\&\quad = p(W_{-b}\mid \beta _{-b},\Sigma _{-b})\cdot p(\beta _{-b})\cdot p(\Sigma _{-b})\\&\quad =\exp \{-\frac{1}{2}\sum _{i=1}^{n}(W_{i,-b}-X_{i,-b}\beta _{-b})^{T}\Sigma _{-b}^{-1}\\&\quad \cdot (W_{i,-b}-X_{i,-b}\beta _{-b})\}\\&\quad \cdot (2\pi )^{-\frac{n(p-1)}{2}}\mid \Sigma _{-b}\mid ^{-\frac{n}{2}}\cdot p(\beta _{-b})\cdot p(\Sigma _{-b})\\&\quad p_{1}(\beta _{-1}, \Sigma _{-1}\mid ProJ_{b},W)\\&\quad = p_{b}(\beta _{-1}^{b},D_{b}\Sigma _{-1}D_{b}^{T}\mid ProJ_{b},W)\\&\quad \cdot \mid J_{\beta _{-b}\rightarrow \beta _{-1}}\mid \cdot \mid J_{\Sigma _{-b}\rightarrow \Sigma _{-1}}\mid \\&\quad =\exp \{-\frac{1}{2}\sum _{i=1}^{n}(W_{i,-b}-X_{i,-b}\beta _{-1}^{b})^{T} (D_{b}\Sigma _{-1}D_{b}^{T})^{-1}\\&\quad \cdot (W_{i,-b}-X_{i,-b}\beta _{-1}^{b})\}\cdot (2\pi )^{-\frac{n(p-1)}{2}}\\&\quad \cdot \mid D_{b}\Sigma _{-1}D_{b}^{T}\mid ^{-\frac{n}{2}}\cdot p(\beta _{-1}^{b})\cdot p(D_{b}\Sigma _{-1}D_{b}^{T})\cdot 1\cdot 1\\&\quad =\exp \{-\frac{1}{2}\sum _{i=1}^{n}(D_{b}^{-1}W_{i,-b}-D_{b}^{-1}X_{i,-b}\beta _{-1}^{b})^{T} \Sigma _{-1}^{-1}\\&\quad \cdot (D_{b}^{-1}W_{i,-b}-D_{b}^{-1}X_{i,-b}\beta _{-1}^{b})\}\cdot (2\pi )^{-\frac{n(p-1)}{2}}\\&\quad \cdot \mid \Sigma _{-1}\mid ^{-\frac{n}{2}}\cdot p(\beta _{-1})\cdot p(\Sigma _{-1})\\&\quad =\exp \{-\frac{1}{2}\sum _{i=1}^{n}(W_{i,-1}-X_{i,-1}\beta _{-1})^{T} \Sigma _{-1}^{-1}\\&\quad \cdot (W_{i,-1}-X_{i,-1}\beta _{-1})\}\cdot (2\pi )^{-\frac{n(p-1)}{2}}\\&\quad \cdot \mid \Sigma _{-1}\mid ^{-\frac{n}{2}}\cdot p(\beta _{-1})\cdot p(\Sigma _{-1})\\&\quad =p_{1}(\beta _{-1}, \Sigma _{-1}\mid ProJ_{1}, W) \end{aligned}$$

where

$$\begin{aligned} \beta _{-1}^{b}= \left( \begin{array}{c} D_{b}\delta _{1,-1}\\ \vdots \\ D_{b}\delta _{q_{1},-1}\\ \gamma \end{array}\right) . \end{aligned}$$

\(\square \)

Appendix 4: Derivation of the posterior of b

Assume the prior of b is uniform on the set P={\(1,2,\ldots , p\)}, then the posterior of b given Y is also uniform on P.

Proof

$$\begin{aligned}&p(W_{-1}\mid b=1)\\&\quad =\int f(W_{-1}\mid b=1,\beta _{-1},\Sigma _{-1}) p(\beta _{-1}) p(\Sigma _{-1})\textrm{d}\beta _{-1}\textrm{d}\Sigma _{-1} \end{aligned}$$

where

$$\begin{aligned}&f(W_{-1}\mid b=1,\beta _{-1},\Sigma _{-1})\\&=\exp \{-\frac{1}{2}(W_{-1}-X_{-1}\beta _{-1})^{T} \Sigma _{-1}^{-1}(W_{-1}-X_{-1}\beta _{-1})\}\\&\cdot (2\pi )^{-\frac{p-1}{2}}\mid \Sigma _{-1}\mid ^{-\frac{1}{2}}\\&p(\beta _{-1})=(2\pi )^{-\frac{q_{1}(p-1)+q_{2}}{2}}\mid B\mid ^{-\frac{1}{2}}\exp \{-\frac{1}{2}\beta _{-1}^{T}B^{-1}\beta _{-1}\}. \end{aligned}$$

Since \(W_{-p}=D_{p}W_{-1}\), we have

$$\begin{aligned}&p(W_{-p}\mid b=1)\\ =&\int f(D_{p}^{-1}W_{-p}\mid b=1,\beta _{-1},\Sigma _{-1}) p(\beta _{-1}) p(\Sigma _{-1})\textrm{d}\beta _{-1}\textrm{d}\Sigma _{-1}. \end{aligned}$$

Further,

$$\begin{aligned}&f(D_{p}^{-1}W_{-p}\mid b=1,\beta _{-1},\Sigma _{-1}) \\&\quad =\exp \{-\frac{1}{2}(W_{-p}-D_{p}X_{-1}\beta _{-1})^{T}(D_{p}\Sigma _{-1}D_{p}^{T})^{-1}\\ {}&\cdot (W_{-p}-D_{p}X_{-1}\beta _{-1})\}\cdot (2\pi )^{-\frac{p-1}{2}}\mid \Sigma _{-1}\mid ^{-\frac{1}{2}}\\&\quad =\exp \{-\frac{1}{2}(W_{-p}-X_{-p}\beta _{-p})^{T}\Sigma _{-p}^{-1}(W_{-p}-X_{-p}\beta _{-p})\}\\&\quad \cdot (2\pi )^{-\frac{p-1}{2}}\mid \Sigma _{-p}\mid ^{-\frac{1}{2}}\\&\quad =f(W_{-p}\mid b=p,\beta _{-p},\Sigma _{-p}), \end{aligned}$$

where \(\Sigma _{-p}=D_{p}\Sigma _{-1}D_{p}^{T}\) and \(\delta _{k,-p}=D_{p}\delta _{k,-1}, k=1,\ldots , q_{2}\), \(\delta _{k,-p}\) and \(\delta _{k,-1}\) are components of \(\beta _{-p}\) and \(\beta _{-1}\) respectively. Since the absolute values of the Jacobian \(J_{\Sigma _{-1}\rightarrow \Sigma _{-p}}\) and \(J_{\beta _{-1}\rightarrow \beta _{-p}}\) are both equal to 1, we have

$$\begin{aligned}&p(W_{-p}\mid b=1) \\&\quad \int f(W_{-p}\mid b=p,\beta _{-p},\Sigma _{-p}) p(\beta _{-p}) p(\Sigma _{-p})\textrm{d}\beta _{-p}\textrm{d}\Sigma _{-p}\\&\quad = p(W_{-p}\mid b=p). \end{aligned}$$

By similar deduction, we get equalities as follows

$$\begin{aligned} p(W_{-p}\mid b=1)= & {} p(W_{-p}\mid b=2)= \cdots \\= & {} p(W_{-p}\mid b=p). \end{aligned}$$

Because Y is uniquely determined by \(W_{-b}\),

$$\begin{aligned} P(Y\mid b=1)= P(Y\mid b=2)= \cdots =P(Y\mid b=p). \end{aligned}$$

According to the Bayes’s rule,

$$\begin{aligned} P(b=j\mid Y)= \frac{P(Y\mid b=j)P(b=j)}{P(Y)}=\frac{1}{p}, \quad j\in P, \end{aligned}$$

since the prior of b is uniform. That is, the posterior of b given Y is uniform on P. \(\square \)

Appendix 5: Bayesian estimation of \(\theta \)

Suppose \(Y_{1}, Y_{2},\ldots ,Y_{n}\sim i.i.d.\) N\((\theta , \frac{1}{\phi })\). Then their joint density is given by

$$\begin{aligned}&p(y_{1},\ldots ,y_{n}\mid \theta , \phi )\\&\quad =(2\pi )^{-\frac{n}{2}}\phi ^{\frac{n}{2}}\exp \{-\frac{\phi }{2}\sum _{i=1}^{n}(y_{i}-\theta )^{2}\}. \end{aligned}$$

A conjugate prior distribution for \((\theta , \phi )\) is normal-gamma distribution. In detail,

$$\begin{aligned} \theta \mid \phi \sim N(\mu _{0}, \frac{1}{\kappa _{0}\phi }),\quad \phi \sim Ga(\frac{\nu _{0}}{2}, \frac{SS_{0}}{2}). \end{aligned}$$

We get the joint posterior distribution for \((\theta , \phi )\) as follows

$$\begin{aligned} \theta \mid \phi , y\sim N(\mu _{n}, \frac{1}{\kappa _{n}\phi }),\quad \phi \mid y\sim Ga(\frac{\nu _{n}}{2}, \frac{SS_{n}}{2}) \end{aligned}$$

where

$$\begin{aligned} \kappa _{n}&=\kappa _{0}+n,\quad \nu _{n} =\nu _{0}+n\\ \mu _{n}&=\frac{\kappa _{0}\mu _{0}+n{\bar{y}}}{\kappa _{n}}\\ SS_{n}&=SS_{0}+SS+\frac{n\kappa _{0}}{\kappa _{n}}({\bar{y}}-\mu _{0})^{2} \end{aligned}$$

and \({\bar{y}} =\frac{1}{n}\sum _{i=1}^{n}y_{i},~ SS = \sum _{i=1}^{n}(y_{i}-{\bar{y}})^{2}.\) As pointed out by Hoff (2009), the marginal posterior distribution of \(\theta \) has a t distribution, i.e.

$$\begin{aligned} \frac{\theta -\mu _{n}}{\sqrt{\frac{SS_{n}}{\kappa _{n}\nu _{n}}}}\sim t(\nu _{n}). \end{aligned}$$

In the simulation studies in Sects. 4.1 and 4.2, \(Y_{1},\ldots ,Y_{50}\) represent the paired differences of average total variations between two competing models. In all cases, we set \(\mu _{0}=0\), which means that we have no preference for either of the paired models in advance. And set \(\kappa _{0}=1, \nu _{0}=1\). In addition, in view of \(\textrm{E}\phi =\frac{\nu _{0}}{SS_{0}}\), we set \(SS_{0}\approx \frac{\nu _{0}}{{\hat{\phi }}}\), where \({\hat{\phi }}\) is the estimate of \(\phi \), such as taking the reciprocal of corresponding sample variance in our studies.

Figure 8 shows the posterior density plot of \(\theta \) in Sect. 4.1, in which the dashed line corresponds to the ATV differences of the BPH model subtracted by the ProJ1 model, and the solid line corresponds to the ATV differences of the AveGSI model subtracted by the ProJ1 model. Figure 9 shows the posterior density plot of \(\theta \) in Sect. 4.2, in which the dashed line corresponds to the ATV differences of the BN1 model subtracted by the AveGSI model, and the solid line corresponds to the ATV differences of the ProJ1 model subtracted by the AveGSI model. In Sect. 4.1, posterior density plots of \(\theta \) result from comparisons between BPH and GSIs resemble the dashed line in Fig. 8, and the others resemble the solid line. In Sect. 4.2, posterior density plots of \(\theta \) result from comparisons between BNs and AveGSI resemble the dashed line in Fig9, and the others resemble the solid line.

Fig. 8
figure 8

Posterior density plot of \(\theta \) in Sect. 4.1. The dased line results from the comparison betweeen BPH and ProJ1, and the solid line results from the comparison between AveGSI and ProJ1

Fig. 9
figure 9

Posterior density plot of \(\theta \) in Sect. 4.2. The dased line results from the comparison betweeen BN1 and AveGSI, and the solid line results from the comparison between ProJ1 and AveGSI

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, M., Gu, M., Wu, X. et al. Globally and symmetrically identified Bayesian multinomial probit model. Stat Comput 33, 68 (2023). https://doi.org/10.1007/s11222-023-10232-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10232-4

Keywords

Navigation