Abstract
Graphs and networks are common ways of depicting biological information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data and single nucleotide polymorphisms (SNPs) data. How to incorporate such a prior network information into analysis of numerical data raises interesting statistical problems. Representing the genetic networks as undirected graphs, we have developed several approaches for identifying differentially expressed genes and genes or SNPs associated with diseases in a unified framework of hidden Markov random field (HMRF) models. Different from the traditional empirical Bayes approaches for analysis of gene expression data, the HMRF-based models account for the prior dependency among the genes on the network and therefore effectively utilize the prior network information in identifying the subnetworks of genes that are perturbed by experimental conditions. In this paper, we briefly review the basic setup of the HMRF models and the emission probability functions for some problems often encountered in analysis of microarray gene expression and SNPs data. We also present some interesting areas that require further research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitchison, J., & Dunsmore, I. R. (1975). Statistical prediction analysis. London: Cambridge University Press.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192–225.
Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of Royal Statistical Society B, 48, 259–302.
Bader, G. D., Donaldson, I., Wolting, C., Ouellette, B. F., Pawson, T., & Hogue, C. W. (2001). BIND–The Biomolecular Interaction Network Database. Nucleic Acids Research, 29, 242–245.
Chuang, H. Y., Lee, E., Liu, Y. T., Lee, D., & Ideker, T. (2007). Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3, 140.
Deng, M., Tu, Z., Sun, F., & Chen, T. (2004). Mapping gene ontology to proteins based on proteinprotein interaction data. Bioinformatics, 20, 895–902.
Deng, M., Zhang, K., Mehta, S., Chen, T., & Sun, F. (2003). Prediction of protein function using proteinprotein interaction data. Journal of Computational Biolology, 10, 947–960.
Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, 1151–1160.
Hong, F. X., & Li, H. (2006). Functional hierarchical models for identifying genes with different time-course expression profiles. Biometrics, 62, 534–544.
Ideker, T., & Sharan, R. (2008). Protein networks in disease. Genome Research, 18, 644–652.
Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., & Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249–264.
Kanehisa, M., & Goto, S. (2002). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28, 27–30.
Kendziorski, C.M., M.A. Newton, H. Lan, & M.N. Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine, 22, 3899–3914.
Li, C., & Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24, 1175–1182.
Li, C., Wei, Z., & Li, H. (2010). Network-based empirical Bayes methods for linear models with applications to genomic Data. Journal of Pharmaceutical Statistics, 20, 209–222.
Li, H, Wei, Z., & Maris, J. (2010). A hidden Markov random field model for genome-wide association studies. Biostatistics, 11, 139–150.
Lonnstedt, I., & Speed, T. P. (2002). Replicated microarray data. Statistica Sinica, 12, 31–46.
Monni, S., & Li, H. (2010). Bayesian analysis for graph-structured genomics data. In M. Chen, D. K. Dey, P. D. Mueller, & Y. Ye (Eds.), Frontier of statistical decision making and bayesian analysis – In honor of James O. Berger.
Newton, M.A., C.M. Kendziorski, C.S. Richmond, F.R. Blattner, & K.W. Tsui (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology, 8, 37–52.
Peri, S., Navarro, J. D., Amanchy, R., Kristiansen, T. Z., Jonnalagadda, C. K., Surendranath, V., Niranjan, V., Muthusamy, B., Gandhi, T. K., Gronborg, M., Ibarrola, N., Deshpande, N., Shanker, K., Shivashankar, H. N., Rashmi, B. P., Ramya, M. A., Zhao, Z., Chandrika, K. N., Padma, N., Harsha, H. C., Yatish, A. J., Kavitha, M. P., Menezes, M., Choudhury, D. R., Suresh, S., Ghosh, N., Saravana, R., Chandran, S., Krishna, S., Joy, M., Anand, S. K., Madavan, V., Joseph, A., Wong, G. W., Schiemann, W. P., Constantinescu, S. N., Huang, L., Khosravi-Far, R., Steen, H., Tewari, M., Ghaffari, S., Blobe, G. C., Dang, C. V., Garcia, J. G., Pevsner, J., Jensen, O. N., Roepstorff, P., Deshpande, K. S., Chinnaiyan, A. M., Hamosh, A., Chakravarti, A., & Pandey, A. (2003). Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research, 13, 2363–2371.
Sharan, R., Ulitsky, I., & Shamir, R. (2007). Network-based prediction of protein function. Molecular Systems Biology, 3(88).
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), Article 3.
Sun, W., & Cai, T. (2009). Large-scale multiple testing under dependency. Journal of the Royal Statistical Society, Series B, 71, 393–424.
Tai, Y. C., & Speed, T. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. Annals of Statistics, 34, 2387–2412.
Ulitsky I., Karp, R. M., & Shamir, R. (2008). Detecting disease-specific dysregulated pathways via analysis of clinical expression profiles. In Proceeding of RECOMB 2008 (pp. 347–359). Berlin: Springer.
Ulitsky, I., & Shamir, R. (2008). Detecting pathways transcriptionally correlated with clinical parameters. Proceedings of the 7th annual international conference on computational systems bioinformatics (CSB 08) (pp. 249–258). London, UK: Imperial College Press.
Ulitsky, I., & Shamir, R. (2009). Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics, 25, 1158–1164.
Wei, P., & Pan, W. (2008). Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics, 24, 404–411.
Wei, Z., & Li, H. (2007). A Markov random field model for network-based analysis of genomic data. Bioinformatics, 23, 1537–1544.
Wei, Z., & Li, H. (2008). A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data. Annals of Applied Statistics, 2(1), 408–429.
Wei, Z., Minturn, J. E., Rappaport, E., Brodeur, G., and Li, H. (2008). Incorporation of genetic pathway information into analysis of multivariate gene expression data. In A. Yakovle, L. Klebanov, & G. Gaile (Eds.), Statistical methods for microarray data analysis. Unpublished manuscript.
Yuan, M., & Kendziorski, C. (2006). Hidden Markov models for microarray time course data under multiple biological conditions (with discussion). Journal of the American Statistical Association, 101(476), 1323–1340.
Zhu, Y., Pan, W., & Shen, X. (2009). Support vector machines with disease-centric network penalty for high dimensional microarray data. Statistics and its Inference, 2(3), 257–269.
Acknowledgements
This research was supported in part by NIH grants ES009911 and CA127334.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, H. (2011). Hidden Markov Random Field Models for Network-Based Analysis of Genomic Data. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-16345-6_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16344-9
Online ISBN: 978-3-642-16345-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)