Abstract
Because of the huge number of partitions of even a moderately sized dataset, even when Bayes factors have a closed form, a comprehensive search for the highest scoring (MAP) partition is usually impossible. Therefore, both deterministic or random search algorithms traverse only a small proportion of partitions of a large dataset. The main contribution of this paper is to encode the formal Bayes factor search on partitions as a weighted MAX-SAT problem and use well-known solvers for that problem to search for partitions. We demonstrate how, with the appropriate priors over the partition space, this method can be used to fully search the space of partitions in smaller problems and how it can be used to enhance the performance of more familiar algorithms in large problems. We illustrate our method on clustering of time-course microarray experiments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anderson, P.E., Smith, J.Q., Edwards, K.D., Millar, A. J.: Guided conjugate Bayesian clustering for uncovering rhytmically expressed genes. CRiSM Working Paper (2006)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3-4), 281–297 (1999)
Booth, J.G., Casella, G., Hobert, J.P.: Clustering using objective functions and stochastic search. J. Royal Statist. Soc.: Series B 70(1), 119–139 (2008)
Chipman, H.A., George, E.I., McCulloch, R.E.: Bayesian treed models. Machine Learning 48(1-3), 299–320 (2002)
Crowley, E.M.: Product partition models for normal means. Journal of the American Statistical Association 92(437), 192–198 (1997)
Cussens, J.: Bayesian network learning by compiling to weighted MAX-SAT. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008)
Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M. (eds.): Bayesian Methods for Nonlinear Classification and Regression. Wiley Series in Probability and Statistics. John Wiley and Sons, Chichester (2002)
Edwards, K.D., Anderson, P.E., Hall, A., Salathia, N.S., Locke, J.C.W., Lynn, J.R., Straume, M., Smith, J.Q., Millar, A.J.: FLOWERING LOCUS C Mediates Natural Variation in the High-Temperature Response of the Arabidopsis Circadian Clock. The Plant Cell 18, 639–650 (2006)
Heard, N.A., Holmes, C.C., Stephens, D.A.: A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves. Journal of the American Statistical Association 101(473), 18–29 (2006)
Lau, J.W., Green, P.J.: Bayesian Model-Based Clustering Procedures. Journal of Computational and Graphical Statistics 16(3), 526 (2007)
Liverani, S., Anderson, P.E., Edwards, K.D., Millar, A.J., Smith, J.Q.: Efficient Utility-based Clustering over High Dimensional Partition Spaces. Journal of Bayesian Analysis 4(3), 539–572 (2009)
McCullagh, P., Yang, J.: Stochastic classification models. In: Proceedings International Congress of Mathmaticians, vol. III, pp. 669–686 (2006)
O’Hagan, A., Forster, J.: Bayesian Inference: Kendall’s Advanced Theory of Statistics, 2nd edn., Arnold (2004)
Quintana, F.A., Iglesias, P.L.: Bayesian clustering and product partition models. J. Royal Statist. Soc.: Series B 65(2), 557–574 (2003)
Ray, S., Mallick, B.: Functional clustering by Bayesian wavelet methods. J. Royal Statist. Soc. Series B 68(2), 305–332 (2006)
Smith, J.Q., Anderson, P.E., Liverani, S.: Separation measures and the geometry of Bayes factor selection for classification. J. Royal Statist. Soc.: Series B 70(5), 957–980 (2006)
Tompkins, D.A.D., Hoos, H.H.: UBCSAT: An implementation and experimentation environment for SLS algorithms for SAT and MAX-SAT. In: Hoos, H.H., Mitchell, D.G. (eds.) Theory and Applications of Satisfiability Testing: Revised Selected Papers of the Seventh International Conference, pp. 306–320. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liverani, S., Cussens, J., Smith, J.Q. (2010). Searching a Multivariate Partition Space Using MAX-SAT. In: Masulli, F., Peterson, L.E., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2009. Lecture Notes in Computer Science(), vol 6160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14571-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-14571-1_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14570-4
Online ISBN: 978-3-642-14571-1
eBook Packages: Computer ScienceComputer Science (R0)