Skip to main content

An Empirical Study of Strategies Boosts Performance of Mutual Information Similarity

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10842))

Abstract

In the recent years, the application of mutual information based measures has received broad popularity. The mutual information MINE measure is asserted to be the best strategy for identification of relationships in challenging data sets. A major weakness of the MINE similarity metric concerns its high execution time. To address the performance issue numerous approaches are suggested both with respect to improvement of software implementations and with respect to the application of simplified heuristics. However, none of the approaches manage to address the high execution-time of MINE computation.

In this work, we address the latter issue. This paper presents a novel MINE implementation which manages a 530x+ performance increase when compared to established approaches. The novel high-performance approach is the result of a structural evaluation of 30+ different MINE software implementations, implementations which do not make use of simplified heuristics. Hence, the proposed strategy for computation of MINE mutual information is both accurate and fast. The novel mutual information MINE software is available at https://bitbucket.org/oekseth/mine-data-analysis/downloads/. To broaden the applicability the high-performance MINE metric is integrated into the hpLysis machine learning library (https://bitbucket.org/oekseth/hplysis-cluster-analysis-software).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    To use low-level assembly instructions for hardware parallel computations (SSE) to reduce execution time.

References

  1. Ehsani, R., Drabløs, F.: TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinform. 17(1), 296 (2016)

    Article  Google Scholar 

  2. Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S.: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5(1), 8 (2007)

    Article  Google Scholar 

  3. Leach, S.M., Tipney, H., Feng, W., Baumgartner Jr., W.A., Kasliwal, P., Schuyler, R.P., Williams, T., Spritz, R.A., Hunter, L.: Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput. Biol. 5(3), 1000215 (2009)

    Article  Google Scholar 

  4. Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33(2), 1134 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  5. Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)

    Article  MATH  Google Scholar 

  6. Liepe, J., Filippi, S., Komorowski, M., Stumpf, M.P.: Maximizing the information content of experiments in systems biology. PLoS Comput. Biol. 9(1), 1002888 (2013)

    Article  MathSciNet  Google Scholar 

  7. Villaverde, A.F., Ross, J., Morán, F., Banga, J.R.: MIDER: network inference with mutual information distance and entropy reduction. PLoS ONE 9(5), 96732 (2014)

    Article  Google Scholar 

  8. Tang, D., Wang, M., Zheng, W., Wang, H.: RapidMic: rapid computation of the maximal information coefficient. Evol. Bioinform. 10, 11 (2014)

    Google Scholar 

  9. Albanese, D., Filosi, M., Visintainer, R., Riccadonna, S., Jurman, G., Furlanello, C.: Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics, 707 (2012)

    Google Scholar 

  10. Chen, Y., Zeng, Y., Luo, F., Yuan, Z.: A new algorithm to optimize maximal information coefficient. PLoS ONE 11(6), 0157567 (2016)

    Google Scholar 

  11. Wang, K., Phillips, C.A., Saxton, A.M., Langston, M.A.: EntropyExplorer: an R package for computing and comparing differential Shannon entropy, differential coefficient of variation and differential expression. BMC Res. Notes 8(1), 832 (2015)

    Article  Google Scholar 

  12. Hausser, J., Strimmer, K.: Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J. Mach. Learn. Res. 10(July), 1469–1484 (2009)

    MathSciNet  MATH  Google Scholar 

  13. Marcon, E., Hérault, B.: Entropart: an R package to measure and partition diversity. J. Stat. Softw. 67(8), 1–26 (2015)

    Article  Google Scholar 

  14. Guevara, M.R., Hartmann, D., Mendoza, M.: diverse: an R package to analyze diversity in complex systems. R J. 8(2), 60–78 (2016)

    Article  Google Scholar 

  15. Ince, R.A., Mazzoni, A., Petersen, R.S., Panzeri, S.: Open source tools for the information theoretic analysis of neural data. Front. Neurosci. 3, 11 (2010)

    Google Scholar 

  16. Mazandu, G.K., Mulder, N.J.: Information content-based gene ontology functional similarity measures: which one to use for a given biological data type? PLoS ONE 9(12), 113859 (2014)

    Article  Google Scholar 

  17. Morgan, H.D., Sutherland, H.G., Martin, D.I., Whitelaw, E.: Epigenetic inheritance at the agouti locus in the mouse. Nat. Genet. 23(3), 314–318 (1999)

    Article  Google Scholar 

  18. Lee, H.-S., Chen, Z.J.: Protein-coding genes are epigenetically regulated in Arabidopsis polyploids. Proc. Nat. Acad. Sci. 98(12), 6753–6758 (2001)

    Article  Google Scholar 

  19. Carro, M., Lim, W., Alvarez, M., Bollo, R., Zhao, X., Snyder, E., Sulman, E., Anne, S., Doetsch, F., Colman, H., et al.: The transcriptional network for mesenchymal transformation of brain tumours. Nature 463(7279), 318 (2010)

    Article  Google Scholar 

  20. Yeger-Lotem, E., Sattath, S., Kashtan, N., Itzkovitz, S., Milo, R., Pinter, R.Y., Alon, U., Margalit, H.: Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc. Nat. Acad. Sci. U.S.A. 101(16), 5934–5939 (2004)

    Article  Google Scholar 

  21. Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11), 1746–1758 (2004)

    Article  Google Scholar 

  22. Sommerfelt, R.M., Feuerherm, A.J., Jones, K., Johansen, B.: Cytosolic phospholipase A2 regulates TNF-induced production of joint destructive effectors in synoviocytes. PLoS ONE 8(12), 83555 (2013)

    Article  Google Scholar 

  23. Lee, W.-P., Tzou, W.-S.: Computational methods for discovering gene networks from expression data. Brief. Bioinform. 10(4), 408–423 (2009)

    Google Scholar 

  24. Riccadonna, S., Jurman, G., Visintainer, R., Filosi, M., Furlanello, C.: DTW-MIC coexpression networks from time-course data. PLoS ONE 11(3), 0152648 (2016)

    Article  Google Scholar 

  25. Ekseth, K., Hvasshovd, S.: hpLysis similarity: a high-performance software-approach for computation of 320+ simliarty-metrics (2017)

    Google Scholar 

  26. Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)

    MathSciNet  Google Scholar 

  27. Lord, E., Diallo, A.B., Makarenkov, V.: Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC Bioinform. 16(1), 1 (2015)

    Article  Google Scholar 

  28. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)

    Article  MATH  Google Scholar 

  29. Ekseth, O.K., Hvasshovd, S.-O.: How an optimized DB-SCAN implementation reduce execution-time and memory-requirements for large data-sets (2017)

    Google Scholar 

  30. Intel: SSE computer-hardware-low-level parallelism. https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Accessed 06 June 2017

  31. Chao, A., Shen, T.-J.: Nonparametric estimation of Shannons index of diversity when there are unseen species in sample. Environ. Ecol. Stat. 10(4), 429–443 (2003)

    Article  MathSciNet  Google Scholar 

  32. Frery, A.C., Cintra, R.J., Nascimento, A.D.: Entropy-based statistical analysis of PolSAR data. IEEE Trans. Geosci. Remote Sens. 51(6), 3733–3743 (2013)

    Article  Google Scholar 

  33. Moon, Y.-I., Rajagopalan, B., Lall, U.: Estimation of mutual information using kernel density estimators. Phys. Rev. E 52(3), 2318 (1995)

    Article  Google Scholar 

  34. Jiao, J., Venkat, K., Han, Y., Weissman, T.: Minimax estimation of functionals of discrete distributions. IEEE Trans. Inf. Theory 61(5), 2835–2885 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Jourdan, J.-H.: Vectorizable, approximated, portable implementations of some mathematical functions. https://github.com/jhjourdan/SIMD-math-prims. Accessed 06 June 2017

  36. Open-MP: Open-MP: a parallel software-wrapper. http://www.openmp.org/. Accessed 17 Nov 2017

Download references

Acknowledgements

The authors would like to thank MD K.I. Ekseth at UIO, Dr. O.V. Solberg at SINTEF, Dr. S.A. Aase at GE Healthcare, MD B.H. Helleberg at NTNU–medical, Dr. Y. Dahl, Dr. T. Aalberg, and K.T. Dragland at NTNU, and Professor P. Sætrom and the High Performance Computing Group at NTNU for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ole Kristian Ekseth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ekseth, O.K., Hvasshovd, SO. (2018). An Empirical Study of Strategies Boosts Performance of Mutual Information Similarity. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10842. Springer, Cham. https://doi.org/10.1007/978-3-319-91262-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91262-2_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91261-5

  • Online ISBN: 978-3-319-91262-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics