Summary
This paper extends deterministic models for Boolean regression within a Bayesian framework. For a given binary criterion variable Y and a set of k binary predictor variables X1,…, Xk, a Boolean regression model is a conjunctive (or disjunctive) logical combination consisting of a subset S of the X variables, which predicts Y. Formally, Boolean regression models include a specification of a k-dimensional binary indicator vector (θ1,…,θk) with θj = 1 iff Xj ∈ S. In a probabilistic extension, a parameter π is added which represents the probability of the predicted value \({\hat y_i}\) and the observed value yi differing (for any observation i). Within a Bayesian framework, a posterior distribution of the parameters (θ1,…, θk, π) is looked for. The advantages of such a Bayesian approach include a proper account of the uncertainty in the model estimates and various possibilities for model checking (using posterior predictive checks). We illustrate this method with an example using real data.

Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
1In total, 314 = 4, 782, 969 combinations are to be considered: each of the 14 predictors is either positively present, negatively present, or not present.
References
Biswas, N.N. (1975). Introduction to logic and switching theory. New York: Gordon and Breach.
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995). Bayesian data analysis. London: Chapman & Hall.
Gelman, A., Leenen, I., Van Mechelen, I., & De Boeck, P. (1999). Bridges between deterministic and probabilistic models for binary data. Manuscript submitted for publication.
Gelman, A., Meng, X.L., & Stern, H.S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733–807.
Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511.
Haider, A.K. (1978). Grouping table for the minimization of n-variable Boolean functions. Proceedings of the Institution of Electric Engineers London, 125, 474–482.
Leenen, I., & Van Mechelen, I. (1998). A branch-and-bound algorithm for Boolean regression. In: I. Balderjahn, R. Mathar, & M. Schader (Eds.), Data Highways and Information Flooding, a Challenge for Classification and Data Analysis (pp. 164–171). Berlin: Springer-Verlag.
McCluskey, E.J. (1965). Introduction to the theory of switching circuits. New York: McGraw-Hill.
McKenzie, D.M., Clarke, D.M., & Low, L.H. (1992). A method of constructing parsimonious diagnostic and screening tests. International Journal of Methods in Psychiatric Research, 2, 71–79.
Mickey, M.R., Mundle, P., & Engelman, L. (1983). Boolean factor analysis. In W.J. Dixon (Ed.), BMDP statistical software (pp. 538–545, p. 692). Berkeley, CA: California Press.
Ragin, C.C., Mayer, S.E., & Drass, K.A. (1984). Assessing discrimination: A Boolean approach. American Sociological Review, 49, 221–234.
Rubin, D.B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151–1172.
Sen, M. (1983). Minimization of Boolean functions of any number of variables using decimal labels. Information Sciences, 30, 37–45.
Sneath, P.H.A., & Sokal, R.R. (1973). Numerical taxonomy. San Francisco: Freeman.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.
Van Mechelen, I. (1988). Prediction of a dichotomous criterion variable by means of a logical combination of dichotomous predictors. Mathémathiques, Informatique et Sciences Humaines, 102, 47–54.
Van Mechelen, I., & De Boeck, P. (1990). Projection of a binary criterion into a model of hierarchical classes. Psychometrika, 55, 677–694.
Wierzbicka, A. (1992). Defining emotion concepts. Cognitive Science, 16, 539–581.
Author information
Authors and Affiliations
Additional information
The authors gratefully acknowledge Brian Junker, Herbert Hoijtink, and William Browne for helpful comments on an earlier draft of this paper, and Johannes Berkhof for helpful discussions.
This work was supported in part by the Research Fund of K.U.Leuven, Grant OT/96/10, and the U.S. National Science Foundation Grant SBR-9708424.
Appendix: Deriving posterior distributions
Appendix: Deriving posterior distributions
We first compute the prior predictive distribution p(y):
the integral in the third step being equal to 1 as it is the area under a Beta density.
For the posterior distribution of (θ,π), we start from Eq. (5):
To derive the marginal posterior distribution of θ, π is integrated out in the joint posterior distribution for θ and π in the formula above.
the latter integral being 1 as it is again the area under a Beta density.
Rights and permissions
About this article
Cite this article
Leenen, I., Van Mechelen, I. & Gelman, A. Bayesian probabilistic extensions of a deterministic classification model. Computational Statistics 15, 355–371 (2000). https://doi.org/10.1007/s001800000039
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001800000039