Skip to main content
Log in

Resolution dependence of the maximal information coefficient for noiseless relationship

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Reshef et al. (Science 334:1518–1523, 2011) introduce the maximal information coefficient, or MIC, which captures a wide range of relationships between pairs of variables. We derive a useful property which can be employed either to substantially reduce the computer time to determine MIC, or to obtain a series of MIC values for different resolutions. Through studying the dependence of the MIC scores on the maximal resolution, employed to partition the data, we show that relationships of different natures can be discerned more clearly. We also provide an iterative greedy algorithm, as an alternative to the ApproxMaxMI proposed by Reshef et al., to determine the value of MIC through iterative optimization, which can be conducted parallelly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. In our work, we adopt stB instead of st<B for the convenience to produce our figures.

  2. For any st<B not on the shell, one has I (D,s,t)≤I (D,s,t′) and I (D,s,t)≤I (D,s′,t) for both st′≈B and stB on the shell. It follows that either m s,t m s,t or m s,t m s′,t .

  3. As the partition line μ 2i−1 is moved by one lattice point (say, from j to j+1), only a small number (only one for most of the cases studied in this paper) of data pairs are moved across this partition line. We calculated the difference in the mutual information I(μ 2i−1=j+1)−I(μ 2i−1=j), instead of I(μ 2i−1=j+1) and I(μ 2i−1=j), to reduce the computation time.

  4. In practice, deadlocked loops may be encountered in simulation due to accumulation of truncation errors. We employ a parameter tolerence=10−35 (or one may adopt a even smaller dynamic value, say 2×10−36, and gradually raise the value as deadlocked loops are encountered), within which differences (of quad-precision) between mutual information are regarded insignificant. For two different positions of a gridline with even mutual information, the one closer to the choice of the last iteration is favored.

  5. By equipartition on the x- (y-) axis or the columns (rows), we mean that approximately the same number of data points are assigned to each column (row).

  6. An “OutOfMemoryError” signal comes out when conducted on a data set of 218 pairs with the default value c=15, but no errors occur with a lower value c=10.

  7. The average of ζ is about 20 and 30 for relationships of class (ii) and class (iii), respectively.

  8. Strictly speaking, this instance is not of class (ii) because it contains a component of a flat segment. Hence, our results here works for a larger group than class (ii), allowing of flat or vertical lines.

References

  • Gray, R.: Entropy and Information Theory, 2nd edn. Springer, Boston (2011)

    Book  MATH  Google Scholar 

  • Mackay, D.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  • Reshef, D., Reshef, Y., Finucane, H., Grossman, R., McVean, G., Turnbaugh, P., Lander, E., Mitzenmacher, M., Sabeti, P.: Detecting novel associations in large data sets. Science 334, 1518–1523 (2011)

    Article  Google Scholar 

  • Speed, T.: A correlation for the 21st Century. Science 334, 1502–1503 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

The work is supported in part by the National Science Council of the Republic of China under Grants No. NSC-100-2112-M002-007, and NSC-100-2112-M032-002-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen-Jer Tzeng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, SC., Pang, NN. & Tzeng, WJ. Resolution dependence of the maximal information coefficient for noiseless relationship. Stat Comput 24, 845–852 (2014). https://doi.org/10.1007/s11222-013-9405-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9405-5

Keywords

Navigation