Abstract
The Minimum Description Length (MDL) is an informationtheoretic principle that can be used for model selection and other statistical inference tasks. One way to implement this principle in practice is to compute the Normalized Maximum Likelihood (NML) distribution for a given parametric model class. Unfortunately this is a computationally infeasible task for many model classes of practical importance. In this paper we present a fast algorithm for computing the NML for the Naive Bayes model class, which is frequently used in classification and clustering tasks. The algorithm is based on a relationship between powers of generating functions and discrete convolution. The resulting algorithm has the time complexity of \(\mathcal{O}n^{2}\), where n is the size of the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brent, R.P.: Multiple-precision zero-finding methods and the complexity of elementary function evaluation. In: Traub, J.F. (ed.) Analytic Computational Complexity, Academic Press, New York (1976)
Flajolet, P., Sedgewick, R.: Analytic Combinatorics (in preparation)
GrĂĽnwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
GrĂĽnwald, P., Myung, J., Pitt, M. (eds.): Advances in Minimum Description Length: Theory and Applications. MIT Press, Cambridge (2005)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20(3), 197–243 (1995)
Henrici, P.: Automatic computations with power series. Journal of the ACM 3(1), 11–15 (1956)
Knuth, D.E.: The Art of Computer Programming, 3rd edn. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1998)
Knuth, D.E., Pittel, B.: A recurrence related to trees. Proceedings of the American Mathematical Society 105(2), 335–349 (1989)
Kontkanen, P., Myllymäki, P.: Analyzing the stochastic complexity via tree polynomials. Technical Report 2005-4, Helsinki Institute for Information Technology (HIIT) (2005)
Kontkanen, P., Myllymäki, P.: A linear-time algorithm for computing the multinomial stochastic complexity. Information Processing Letters 103(6), 227–233 (2007)
Kontkanen, P., Myllymäki, P., Buntine, W., Rissanen, J., Tirri, H.: An MDL framework for data clustering. In: Grünwald, P., Myung, I.J., Pitt, M. (eds.) Advances in Minimum Description Length: Theory and Applications, MIT Press, Cambridge (2006)
Kontkanen, P., Wettig, H., Myllymäki, P.: NML computation algorithms for tree-structured multinomial Bayesian networks. EURASIP Journal on Bioinformatics and Systems Biology (to appear)
Nakos, G.: Expansions of powers of multivariate formal power series. Mathematica Journal 3(1), 45–47 (1993)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988)
Rissanen, J.: Stochastic Complexity in Statistical Inquiry. World Scientific, New Jersey (1989)
Rissanen, J.: Fisher information and stochastic complexity. IEEE Transactions on Information Theory 42(1), 40–47 (1996)
Rissanen, J.: Information and Complexity in Statistical Modeling. Springer, Heidelberg (2007)
Shtarkov, Y.M.: Universal sequential coding of single messages. Problems of Information Transmission 23, 3–17 (1987)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mononen, T., Myllymäki, P. (2007). Fast NML Computation for Naive Bayes Models. In: Corruble, V., Takeda, M., Suzuki, E. (eds) Discovery Science. DS 2007. Lecture Notes in Computer Science(), vol 4755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75488-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-75488-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75487-9
Online ISBN: 978-3-540-75488-6
eBook Packages: Computer ScienceComputer Science (R0)