An Empirical Study of Strategies Boosts Performance of Mutual Information Similarity

Ekseth, Ole Kristian; Hvasshovd, Svein-Olav

doi:10.1007/978-3-319-91262-2_29

An Empirical Study of Strategies Boosts Performance of Mutual Information Similarity

Ole Kristian Ekseth¹⁸ &
Svein-Olav Hvasshovd¹⁸

Conference paper
First Online: 11 May 2018

1867 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10842))

Abstract

In the recent years, the application of mutual information based measures has received broad popularity. The mutual information MINE measure is asserted to be the best strategy for identification of relationships in challenging data sets. A major weakness of the MINE similarity metric concerns its high execution time. To address the performance issue numerous approaches are suggested both with respect to improvement of software implementations and with respect to the application of simplified heuristics. However, none of the approaches manage to address the high execution-time of MINE computation.

In this work, we address the latter issue. This paper presents a novel MINE implementation which manages a 530x+ performance increase when compared to established approaches. The novel high-performance approach is the result of a structural evaluation of 30+ different MINE software implementations, implementations which do not make use of simplified heuristics. Hence, the proposed strategy for computation of MINE mutual information is both accurate and fast. The novel mutual information MINE software is available at https://bitbucket.org/oekseth/mine-data-analysis/downloads/. To broaden the applicability the high-performance MINE metric is integrated into the hpLysis machine learning library (https://bitbucket.org/oekseth/hplysis-cluster-analysis-software).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
To use low-level assembly instructions for hardware parallel computations (SSE) to reduce execution time.

References

Ehsani, R., Drabløs, F.: TopoICSim: a new semantic similarity measure based on gene ontology. BMC Bioinform. 17(1), 296 (2016)
Article Google Scholar
Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S.: Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5(1), 8 (2007)
Article Google Scholar
Leach, S.M., Tipney, H., Feng, W., Baumgartner Jr., W.A., Kasliwal, P., Schuyler, R.P., Williams, T., Spritz, R.A., Hunter, L.: Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput. Biol. 5(3), 1000215 (2009)
Article Google Scholar
Fraser, A.M., Swinney, H.L.: Independent coordinates for strange attractors from mutual information. Phys. Rev. A 33(2), 1134 (1986)
Article MathSciNet MATH Google Scholar
Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518–1524 (2011)
Article MATH Google Scholar
Liepe, J., Filippi, S., Komorowski, M., Stumpf, M.P.: Maximizing the information content of experiments in systems biology. PLoS Comput. Biol. 9(1), 1002888 (2013)
Article MathSciNet Google Scholar
Villaverde, A.F., Ross, J., Morán, F., Banga, J.R.: MIDER: network inference with mutual information distance and entropy reduction. PLoS ONE 9(5), 96732 (2014)
Article Google Scholar
Tang, D., Wang, M., Zheng, W., Wang, H.: RapidMic: rapid computation of the maximal information coefficient. Evol. Bioinform. 10, 11 (2014)
Google Scholar
Albanese, D., Filosi, M., Visintainer, R., Riccadonna, S., Jurman, G., Furlanello, C.: Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. Bioinformatics, 707 (2012)
Google Scholar
Chen, Y., Zeng, Y., Luo, F., Yuan, Z.: A new algorithm to optimize maximal information coefficient. PLoS ONE 11(6), 0157567 (2016)
Google Scholar
Wang, K., Phillips, C.A., Saxton, A.M., Langston, M.A.: EntropyExplorer: an R package for computing and comparing differential Shannon entropy, differential coefficient of variation and differential expression. BMC Res. Notes 8(1), 832 (2015)
Article Google Scholar
Hausser, J., Strimmer, K.: Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. J. Mach. Learn. Res. 10(July), 1469–1484 (2009)
MathSciNet MATH Google Scholar
Marcon, E., Hérault, B.: Entropart: an R package to measure and partition diversity. J. Stat. Softw. 67(8), 1–26 (2015)
Article Google Scholar
Guevara, M.R., Hartmann, D., Mendoza, M.: diverse: an R package to analyze diversity in complex systems. R J. 8(2), 60–78 (2016)
Article Google Scholar
Ince, R.A., Mazzoni, A., Petersen, R.S., Panzeri, S.: Open source tools for the information theoretic analysis of neural data. Front. Neurosci. 3, 11 (2010)
Google Scholar
Mazandu, G.K., Mulder, N.J.: Information content-based gene ontology functional similarity measures: which one to use for a given biological data type? PLoS ONE 9(12), 113859 (2014)
Article Google Scholar
Morgan, H.D., Sutherland, H.G., Martin, D.I., Whitelaw, E.: Epigenetic inheritance at the agouti locus in the mouse. Nat. Genet. 23(3), 314–318 (1999)
Article Google Scholar
Lee, H.-S., Chen, Z.J.: Protein-coding genes are epigenetically regulated in Arabidopsis polyploids. Proc. Nat. Acad. Sci. 98(12), 6753–6758 (2001)
Article Google Scholar
Carro, M., Lim, W., Alvarez, M., Bollo, R., Zhao, X., Snyder, E., Sulman, E., Anne, S., Doetsch, F., Colman, H., et al.: The transcriptional network for mesenchymal transformation of brain tumours. Nature 463(7279), 318 (2010)
Article Google Scholar
Yeger-Lotem, E., Sattath, S., Kashtan, N., Itzkovitz, S., Milo, R., Pinter, R.Y., Alon, U., Margalit, H.: Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. Proc. Nat. Acad. Sci. U.S.A. 101(16), 5934–5939 (2004)
Article Google Scholar
Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11), 1746–1758 (2004)
Article Google Scholar
Sommerfelt, R.M., Feuerherm, A.J., Jones, K., Johansen, B.: Cytosolic phospholipase A2 regulates TNF-induced production of joint destructive effectors in synoviocytes. PLoS ONE 8(12), 83555 (2013)
Article Google Scholar
Lee, W.-P., Tzou, W.-S.: Computational methods for discovering gene networks from expression data. Brief. Bioinform. 10(4), 408–423 (2009)
Google Scholar
Riccadonna, S., Jurman, G., Visintainer, R., Filosi, M., Furlanello, C.: DTW-MIC coexpression networks from time-course data. PLoS ONE 11(3), 0152648 (2016)
Article Google Scholar
Ekseth, K., Hvasshovd, S.: hpLysis similarity: a high-performance software-approach for computation of 320+ simliarty-metrics (2017)
Google Scholar
Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. City 1(2), 1 (2007)
MathSciNet Google Scholar
Lord, E., Diallo, A.B., Makarenkov, V.: Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms. BMC Bioinform. 16(1), 1 (2015)
Article Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Article MATH Google Scholar
Ekseth, O.K., Hvasshovd, S.-O.: How an optimized DB-SCAN implementation reduce execution-time and memory-requirements for large data-sets (2017)
Google Scholar
Intel: SSE computer-hardware-low-level parallelism. https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Accessed 06 June 2017
Chao, A., Shen, T.-J.: Nonparametric estimation of Shannons index of diversity when there are unseen species in sample. Environ. Ecol. Stat. 10(4), 429–443 (2003)
Article MathSciNet Google Scholar
Frery, A.C., Cintra, R.J., Nascimento, A.D.: Entropy-based statistical analysis of PolSAR data. IEEE Trans. Geosci. Remote Sens. 51(6), 3733–3743 (2013)
Article Google Scholar
Moon, Y.-I., Rajagopalan, B., Lall, U.: Estimation of mutual information using kernel density estimators. Phys. Rev. E 52(3), 2318 (1995)
Article Google Scholar
Jiao, J., Venkat, K., Han, Y., Weissman, T.: Minimax estimation of functionals of discrete distributions. IEEE Trans. Inf. Theory 61(5), 2835–2885 (2015)
Article MathSciNet MATH Google Scholar
Jourdan, J.-H.: Vectorizable, approximated, portable implementations of some mathematical functions. https://github.com/jhjourdan/SIMD-math-prims. Accessed 06 June 2017
Open-MP: Open-MP: a parallel software-wrapper. http://www.openmp.org/. Accessed 17 Nov 2017

Download references

Acknowledgements

The authors would like to thank MD K.I. Ekseth at UIO, Dr. O.V. Solberg at SINTEF, Dr. S.A. Aase at GE Healthcare, MD B.H. Helleberg at NTNU–medical, Dr. Y. Dahl, Dr. T. Aalberg, and K.T. Dragland at NTNU, and Professor P. Sætrom and the High Performance Computing Group at NTNU for their support.

Author information

Authors and Affiliations

Department of Computer Science (IDI), NTNU, Trondheim, Norway
Ole Kristian Ekseth & Svein-Olav Hvasshovd

Authors

Ole Kristian Ekseth
View author publications
You can also search for this author in PubMed Google Scholar
Svein-Olav Hvasshovd
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ole Kristian Ekseth .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
University of Alberta, Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Kraków, Poland
Ryszard Tadeusiewicz
University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ekseth, O.K., Hvasshovd, SO. (2018). An Empirical Study of Strategies Boosts Performance of Mutual Information Similarity. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10842. Springer, Cham. https://doi.org/10.1007/978-3-319-91262-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-91262-2_29
Published: 11 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91261-5
Online ISBN: 978-3-319-91262-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics