Abstract
The Shannon information content is a fundamental quantity and it is of great importance to estimate it from observed dataset in the field of statistics, information theory, and machine learning. In this study, an estimator for the information content using a given set of weighted data is proposed. The empirical data distribution varies depending on the weight. The notable features of the proposed estimator are its computational efficiency and its ability to deal with weighted data. The proposed estimator is extended in order to estimate cross entropy, entropy and KL divergence with weighted data. Then, the estimators are applied to classification with one-class samples, and distribution preserving data compression problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cover, T.M., Thomas, J.A.: Elements of information theory. J. Wiley, Chichester (1991)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. J. Wiley, Chichester (2001)
Learned-Miller, E.G., Fisher, J.W.: ICA using spacings estimates of entropy. JMLR 4(7-8), 1271–1295 (2004)
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall/CRC (1994)
Beirlant, J., Dudewicz, E.J., Györfi, L., Meulen, E.C.: Nonparametric entropy estimation: An overview. International Journal of the Mathematical Statistics Sciences 6, 17–39 (1997)
Kozachenko, L.F., Leonenko, N.N.: Sample estimate of entropy of a random vector. Problems of Information Transmission 23, 95–101 (1987)
Wang, Q., Kulkarni, S.R., Verdú, S.: Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Trans. Inf. Theor. 55(5), 2392–2405 (2009)
Faivishevsky, L., Goldberger, J.: ICA based on a Smooth Estimation of the Differential Entropy. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) NIPS 2008, pp. 433–440. MIT Press, Cambridge (2008)
Rätsch, G., Onoda, T., Müller, K.R.: Soft margins for adaboost. Machine Learning 42(3), 287–320 (2001)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)
Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A.: kernlab - An S4 package for kernel methods in R. Journal of Statistical Software 11(9) (2004)
Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., Torkkola, K.: LVQ PAK:the learning vector quantization program package. Tecnical report., Helsinki University of Technology (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hino, H., Murata, N. (2011). A Computationally Efficient Information Estimator for Weighted Data. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2011. ICANN 2011. Lecture Notes in Computer Science, vol 6792. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21738-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-21738-8_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21737-1
Online ISBN: 978-3-642-21738-8
eBook Packages: Computer ScienceComputer Science (R0)