Abstract
We present a parallel method for construction of gene regulatory networks from large-scale gene expression data. Our method integrates mutual information, data processing inequality and statistical testing to detect significant dependencies between genes, and efficiently exploits parallelism inherent in such computations. We present a novel method to carry out permutation testing for assessing statistical significance while reducing its computational complexity by a factor of Θ(n 2), where n is the number of genes. Using both synthetic and known regulatory networks, we show that our method produces networks of quality similar to ARACNE, a widely used mutual information based method. We present a parallelization of the algorithm that, for the first time, allows construction of whole genome networks from thousands of microarray experiments using rigorous mutual information based methodology. We report the construction of a 15,147 gene network of the plant Arabidopsis thaliana from 2,996 microarray experiments on a 2,048-CPU Blue Gene/L in 45 minutes, thus addressing a grand challenge problem in the NSF Arabidopsis 2010 initiative.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhu, X., Gerstein, M., Snyder, M.: Getting connected: analysis and principles of biological networks. Genes & development 21(9), 1010–1024 (2007)
The chipping forecast II. Special Supplement. Nature Genetics (2002)
Torres, T., Metta, M., Ottenwalder, B., et al.: Gene expression profiling by massively parallel sequencing. Genome research 18(1), 172–177 (2008)
Butte, A., Kohane, I.: Unsupervised knowledge discovery in medical databases using relevance networks. In: Proc. of American Medical Informatics Association Symposium, pp. 711–715 (1999)
D’haeseleer, P., Wen, X., Fuhrman, S., et al.: Mining the gene expression matrix: Inferring gene relationships from large scale gene expression data. In: Information Processing in Cells and Tissues (1998)
de la Fuente, A., Bing, N., Hoeschele, I., et al.: Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20(18), 3565–3574 (2004)
Schafer, J., Strimmer, K.: An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21(6), 754–764 (2005)
Friedman, N., Linial, M., Nachman, I., et al.: Using Bayesian networks to analyze expression data. Journal of Computational Biology 7, 601–620 (2000)
Yu, H., Smith, A., Wang, P., et al.: Using Bayesian network inference algorithms to recover molecular genetic regulatory networks. In: Proc. of International Conference on Systems Biology (2002)
Daub, C., Steuer, R., Selbig, J., et al.: Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data. BMC Bioinformatics 5, 118 (2004)
Hartemink, A.: Reverse engineering gene regulatory networks. Nature Biotechnology 23(5), 554–555 (2005)
Ma, S., Gong, Q., Bohnert, H.: An Arabidopsis gene network based on the graphical Gaussian model. Genome research 17(11), 1614–1625 (2007)
Basso, K., Margolin, A., Stolovitzky, G., et al.: Reverse engineering of regulatory networks in human B cells. Nature Genetics 37(4), 382–390 (2005)
Butte, A., Kohane, I.: Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In: Pacific Symposium on Biocomputing, pp. 418–429 (2000)
Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley, Chichester (2006)
EMBL-EBI ArrayExpress (last visited) (2008), http://www.ebi.ac.uk/microarray-as/aer/
NCBI Gene Expression Omnibus (last visited) (2008), http://www.ncbi.nlm.nih.gov/geo/
NASC European Arabidopsis Stock Centre (last visited) (2008), http://www.arabidopsis.info/
Schneidman, E., Still, S., Berry, M., et al.: Network information and connected correlations. Physical review letters 91(23), 238701 (2003)
Khan, S., Bandyopadhyay, S., Ganguly, A., et al.: Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Physical review. E 76(2 Pt 2), 026209 (2007)
Moon, Y., Rajagopalan, B., Lall, U.: Estimation of mutual information using kernel density estimators. Physical review. E 52(3), 2318–2321 (1995)
Kraskov, A., Stogbauer, H., Grassberger, P.: Estimating mutual information. Physical review. E 69(6 Pt 2), 066138 (2004)
De Boor, C.: A practical guide to splines. Springer, Heidelberg (1978)
Van den Bulcke, T., Van Leemput, K., Naudts, B., et al.: SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics 7, 43 (2006)
Palaniswamy, S., James, S., Sun, H., et al.: AGRIS and AtRegNet. A platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant physiology 140(3), 818–829 (2006)
Statistical algorithms description document (last visited) (2008), http://www.affymetrix.com/
Irizarry, R., Warren, D., Spencer, F., et al.: Multiple-laboratory comparison of microarray platforms. Nature Methods 2, 345–350 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zola, J., Aluru, M., Aluru, S. (2008). Parallel Information Theory Based Construction of Gene Regulatory Networks. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2008. HiPC 2008. Lecture Notes in Computer Science, vol 5374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89894-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-89894-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89893-1
Online ISBN: 978-3-540-89894-8
eBook Packages: Computer ScienceComputer Science (R0)