Abstract
Inductive inference has been a subject of intensive research efforts over several decades. In particular, for classification problems substantial advances have been made and the field has matured into a wide range of powerful approaches to inductive inference. However, a considerable challenge arises when deriving principles for an inductive supervised classifier in the presence of unpredictable or unanticipated events corresponding to unknown alphabets of observable features. Bayesian inductive theories based on de Finetti type exchangeability which have become popular in supervised classification do not apply to such problems. Here we derive an inductive supervised classifier based on partition exchangeability due to John Kingman. It is proven that, in contrast to classifiers based on de Finetti type exchangeability which can optimally handle test items independently of each other in the presence of infinite amounts of training data, a classifier based on partition exchangeability still continues to benefit from a joint prediction of labels for the whole population of test items. Some remarks about the relation of this work to generic convergence results in predictive inference are also given.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Solomonoff, R.J.: A formal theory of inductive inference. Part I. Inf. Control 7, 1–22 (1964)
Hunt, E.B., Marin, J., Stone, P.J.: Experiments in induction. Academic Press, New York (1966)
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14, 462–467 (1968)
Bailey, N.T.J.: Probability methods of diagnosis based on small samples. In: Mathematics and Computer Science in Biology and Medicine. H.M. Stationery Office, London (1965)
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: A review. IEEE Trans. Patt. Anal. Mach. Intell. 22, 4–37 (2000)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification, 2nd edn. Wiley, New York (2000)
Bishop, C.M.: Pattern recognition and machine learning. Springer, New York (2007)
Hand, D.J., Yu, K.: Idiot’s Bayes: not so stupid after all. Int. Stat. Rev. 69, 385–398 (2001)
Jeffrey, R.: Probabilism and induction. Topoi 5, 51–58 (1986)
Zabell, S.L.: Predicting the unpredictable. Synthese 90, 205–232 (1992)
Angluin, D., Smith, C.H.: Inductive inference: Theory and methods. ACM Comput. Surv. 15, 237–268 (1983)
Michalski, R.S.: A theory and methodology of inductive learning. Artif. Intell. 20, 111–161 (1983)
Solomonoff, R.J.: Three kinds of probabilistic induction: universal distributions and convergence theorems. Christopher Stewart WALLACE (1933-2004); Memorial Special Issue. Comput. J. 51, 566–570 (2008)
Kingman, J.F.C.: The population structure associated with the Ewens sampling formula. Theor. Pop. Biol. 11, 274–283 (1977)
Kingman, J.F.C.: The representation of partition structures. J. London Math. Soc. 18, 374–380 (1978)
Kingman, J.F.C.: Random partitions in population genetics. Proc. Roy. Soc. A 361, 1–20 (1978)
Kingman, J.F.C.: Uses of exchangeability. Ann. Probab. 6, 183–197 (1978)
Friedman, N., Singer, Y.: Efficient Bayesian parameter estimation in large discrete domains. In: Kearns, M.J., Solla, S.A., Cohn, D.A. (eds.) Advances in Neural Information Processing Systems, vol. 11, pp. 417–423 (1998)
Orlitsky, A., Santhanam, N.P., Zhang, J.: Universal compression of memoryless sources over unknown alphabets. IEEE Trans. Inf. Theory 50, 1469–1481 (2004)
Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1982–1989 (2009)
Corander, J., Cui, Y., Koski, T., Sirén, J.: Have I seen you before? Principles of predictive classification revisited. Stat. Comput. 23, 59–73 (2013)
Cui, Y., Corander, J., Koski, T., Sirén, J.: Predictive Gaussian classifiers. Submitted to Bayesian Analysis (2013)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Nádas, A.: Optimal solution of a training problem in speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 33, 326–329 (1985)
Geisser, S.: Predictive discrimination. In: Krishnajah, P.R. (ed.) Multivariate analysis, pp. 149–163. Academic Press, New York (1966)
Geisser, S.: Predictive Inference: An introduction. Chapman & Hall, London (1993)
Dawson, K.J., Belkhir, K.: A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet. Res. 78, 59–77 (2001)
Corander, J., Gyllenberg, M., Koski, T.: Random partition models and exchangeability for Bayesian identification of population structure. Bull. Math. Biol. 69, 797–815 (2007)
Jackson, M.O., Kalai, E., Smorodinsky, R.: Bayesian representation of stochastic processes under learning: de Finetti revisited. Econometrics 67, 875–893 (1999)
Solomonoff, R.J.: Complexity-based induction systems: comparisons and convergence theorems. IEEE Trans. Inf. Theory 24, 422–432 (1978)
Blackwell, D., Dubins, L.: Merging of opinions with increasing information. Ann. Math. Stat. 33, 882–886 (1962)
Joyce, P.: Partition Structures and sufficient statistics J. Appl. Prob. 35, 622–632 (1998)
Dowe, D.L.: Foreword re C. S. Wallace. Christopher Stewart WALLACE (1933-2004); Memorial Special Issue. Comput. J. 51, 523–560 (2008)
Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Bandyopadhyay, P.S., Forster, M.R. (eds.) Handbook of the Philosophy of Science - Philosophy of Statistics, pp. 901–982. Elsevier, Oxford (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Corander, J., Cui, Y., Koski, T. (2013). Inductive Inference and Partition Exchangeability in Classification. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-44958-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44957-4
Online ISBN: 978-3-642-44958-1
eBook Packages: Computer ScienceComputer Science (R0)