Abstract
Mutual Information (MI) is an important dependency measure between random variables, due to its tight connection with information theory. It has numerous applications, both in theory and practice. However, when employed in practice, it is often necessary to estimate the MI from available data. There are several methods to approximate the MI, but arguably one of the simplest and most widespread techniques is the histogram-based approach. This paper suggests the use of fuzzy partitioning for the histogram-based MI estimation. It uses a general form of fuzzy membership functions, which includes the class of crisp membership functions as a special case. It is accordingly shown that the average absolute error of the fuzzy-histogram method is less than that of the naïve histogram method. Moreover, the accuracy of our technique is comparable, and in some cases superior to the accuracy of the Kernel density estimation (KDE) method, which is one of the best MI estimation methods. Furthermore, the computational cost of our technique is significantly less than that of the KDE. The new estimation method is investigated from different aspects, such as average error, bias and variance. Moreover, we explore the usefulness of the fuzzy-histogram MI estimator in a real-world bioinformatics application. Our experiments show that, in contrast to the naïve histogram MI estimator, the fuzzy-histogram MI estimator is able to reveal all dependencies between the gene-expression data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ang, K. K., Chin, Z. Y., Zhang, H., & Guan, C. (2012). Mutual information-based selection of optimal spatial-temporal patterns for single-trial eeg-based bcis. Pattern Recognition, 45(6), 2137–2144.
Crouzet, J. F., & Strauss, O. (2011). Interval-valued probability density estimation based on quasi-continuous histograms: Proof of the conjecture. Fuzzy Sets and Systems, 183(1), 92–100.
Darbellay, G. (2000). Entropy expressions for multivariate continuous distributions. IEEE Transactions on Information Theory, 46(2), 709–712.
Darbellay, G. A., & Vajda, I. (1999). Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45(4), 1315–1321.
Hughes, T. R. (2012). Supplementary data file of gene expression. http://hugheslab.ccbr.utoronto.ca/supplementary-data/rii/. [Online; Accessed 20 Dec 2012].
Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., et al. (2000). Functional discovery via a compendium of expression profiles. Cell, 102(1), 109–126.
Karasuyama, M., & Sugiyama, M. (2012). Canonical dependency analysis based on squared-loss mutual information. Neural Networking, 34, 46–55.
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138.
Loquin, K., & Strauss, O. (2006). Fuzzy histograms and density estimation. In J. Lawry, E. Miranda, A. Bugarin, S. Li, M. A. Gil, P. Grzegorzewski, & O. Hyrniewicz (Eds.), Soft methods for integrated uncertainty modelling, volume 37 of advances in soft computing (pp. 45–52). Berlin Heidelberg: Springer.
Loquin, K., & Strauss, O. (2008). Histogram density estimators based upon a fuzzy partition. Statistics and Probability Letters, 78(13), 1863–1868.
Moddemeijer, R. (1989). On estimation of entropy and mutual information of continuous distributions. Signal Processing, 16(3), 233–248.
Moon, Y. I., Rajagopalan, B., & Lall, U. (1995). Estimation of mutual information using kernel density estimators. Physical Review E, 52(3), 2318–2321.
Schaffernicht, E., Kaltenhaeuser, R., Verma, S., & Gross, H. M. (2010). On estimating mutual information for feature selection. Artificial Neural Networks-ICANN, 2010, 362–367.
Steuer, R., Kurths, J., Daub, C. O., Weise, J., & Selbig, J. (2002). The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics, 18(suppl 2), S231–S240.
Tenekedjiev, K., & Nikolova, N. (2008). Justification and numerical realization of the uniform method for finding point estimates of interval elicited scaling constants. Fuzzy Optimization and Decision Making, 7(2), 119–145.
Wang, Q., Shen, Y., & Zhang, J. Q. (2005). A nonlinear correlation measure for multivariable data set. Physica D: Nonlinear Phenomena, 200(3–4), 287–295.
Zografos, K., & Nadarajah, S. (2005). Expressions for rényi and shannon entropies for multivariate distributions. Statistics and Probability Letters, 71(1), 71–84.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amir Haeri, M., Ebadzadeh, M.M. Estimation of mutual information by the fuzzy histogram. Fuzzy Optim Decis Making 13, 287–318 (2014). https://doi.org/10.1007/s10700-014-9178-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10700-014-9178-0