Hidden Markov Random Field Models for Network-Based Analysis of Genomic Data

Li, Hongzhe

doi:10.1007/978-3-642-16345-6_17

Hongzhe Li⁴

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

4127 Accesses

Abstract

Graphs and networks are common ways of depicting biological information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data and single nucleotide polymorphisms (SNPs) data. How to incorporate such a prior network information into analysis of numerical data raises interesting statistical problems. Representing the genetic networks as undirected graphs, we have developed several approaches for identifying differentially expressed genes and genes or SNPs associated with diseases in a unified framework of hidden Markov random field (HMRF) models. Different from the traditional empirical Bayes approaches for analysis of gene expression data, the HMRF-based models account for the prior dependency among the genes on the network and therefore effectively utilize the prior network information in identifying the subnetworks of genes that are perturbed by experimental conditions. In this paper, we briefly review the basic setup of the HMRF models and the emission probability functions for some problems often encountered in analysis of microarray gene expression and SNPs data. We also present some interesting areas that require further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aitchison, J., & Dunsmore, I. R. (1975). Statistical prediction analysis. London: Cambridge University Press.
Book MATH Google Scholar
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192–225.
MathSciNet MATH Google Scholar
Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of Royal Statistical Society B, 48, 259–302.
MathSciNet MATH Google Scholar
Bader, G. D., Donaldson, I., Wolting, C., Ouellette, B. F., Pawson, T., & Hogue, C. W. (2001). BIND–The Biomolecular Interaction Network Database. Nucleic Acids Research, 29, 242–245.
Article Google Scholar
Chuang, H. Y., Lee, E., Liu, Y. T., Lee, D., & Ideker, T. (2007). Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3, 140.
Article Google Scholar
Deng, M., Tu, Z., Sun, F., & Chen, T. (2004). Mapping gene ontology to proteins based on proteinprotein interaction data. Bioinformatics, 20, 895–902.
Article Google Scholar
Deng, M., Zhang, K., Mehta, S., Chen, T., & Sun, F. (2003). Prediction of protein function using proteinprotein interaction data. Journal of Computational Biolology, 10, 947–960.
Article Google Scholar
Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, 1151–1160.
Article MathSciNet MATH Google Scholar
Hong, F. X., & Li, H. (2006). Functional hierarchical models for identifying genes with different time-course expression profiles. Biometrics, 62, 534–544.
Article MathSciNet MATH Google Scholar
Ideker, T., & Sharan, R. (2008). Protein networks in disease. Genome Research, 18, 644–652.
Article Google Scholar
Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., & Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249–264.
Article MATH Google Scholar
Kanehisa, M., & Goto, S. (2002). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28, 27–30.
Article Google Scholar
Kendziorski, C.M., M.A. Newton, H. Lan, & M.N. Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine, 22, 3899–3914.
Article Google Scholar
Li, C., & Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24, 1175–1182.
Article Google Scholar
Li, C., Wei, Z., & Li, H. (2010). Network-based empirical Bayes methods for linear models with applications to genomic Data. Journal of Pharmaceutical Statistics, 20, 209–222.
MathSciNet Google Scholar
Li, H, Wei, Z., & Maris, J. (2010). A hidden Markov random field model for genome-wide association studies. Biostatistics, 11, 139–150.
Article Google Scholar
Lonnstedt, I., & Speed, T. P. (2002). Replicated microarray data. Statistica Sinica, 12, 31–46.
MathSciNet Google Scholar
Monni, S., & Li, H. (2010). Bayesian analysis for graph-structured genomics data. In M. Chen, D. K. Dey, P. D. Mueller, & Y. Ye (Eds.), Frontier of statistical decision making and bayesian analysis – In honor of James O. Berger.
Google Scholar
Newton, M.A., C.M. Kendziorski, C.S. Richmond, F.R. Blattner, & K.W. Tsui (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology, 8, 37–52.
Article Google Scholar
Peri, S., Navarro, J. D., Amanchy, R., Kristiansen, T. Z., Jonnalagadda, C. K., Surendranath, V., Niranjan, V., Muthusamy, B., Gandhi, T. K., Gronborg, M., Ibarrola, N., Deshpande, N., Shanker, K., Shivashankar, H. N., Rashmi, B. P., Ramya, M. A., Zhao, Z., Chandrika, K. N., Padma, N., Harsha, H. C., Yatish, A. J., Kavitha, M. P., Menezes, M., Choudhury, D. R., Suresh, S., Ghosh, N., Saravana, R., Chandran, S., Krishna, S., Joy, M., Anand, S. K., Madavan, V., Joseph, A., Wong, G. W., Schiemann, W. P., Constantinescu, S. N., Huang, L., Khosravi-Far, R., Steen, H., Tewari, M., Ghaffari, S., Blobe, G. C., Dang, C. V., Garcia, J. G., Pevsner, J., Jensen, O. N., Roepstorff, P., Deshpande, K. S., Chinnaiyan, A. M., Hamosh, A., Chakravarti, A., & Pandey, A. (2003). Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research, 13, 2363–2371.
Article Google Scholar
Sharan, R., Ulitsky, I., & Shamir, R. (2007). Network-based prediction of protein function. Molecular Systems Biology, 3(88).
Google Scholar
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), Article 3.
Google Scholar
Sun, W., & Cai, T. (2009). Large-scale multiple testing under dependency. Journal of the Royal Statistical Society, Series B, 71, 393–424.
Article MathSciNet MATH Google Scholar
Tai, Y. C., & Speed, T. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. Annals of Statistics, 34, 2387–2412.
Article MathSciNet MATH Google Scholar
Ulitsky I., Karp, R. M., & Shamir, R. (2008). Detecting disease-specific dysregulated pathways via analysis of clinical expression profiles. In Proceeding of RECOMB 2008 (pp. 347–359). Berlin: Springer.
Google Scholar
Ulitsky, I., & Shamir, R. (2008). Detecting pathways transcriptionally correlated with clinical parameters. Proceedings of the 7th annual international conference on computational systems bioinformatics (CSB 08) (pp. 249–258). London, UK: Imperial College Press.
Google Scholar
Ulitsky, I., & Shamir, R. (2009). Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics, 25, 1158–1164.
Article Google Scholar
Wei, P., & Pan, W. (2008). Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics, 24, 404–411.
Article Google Scholar
Wei, Z., & Li, H. (2007). A Markov random field model for network-based analysis of genomic data. Bioinformatics, 23, 1537–1544.
Article MathSciNet Google Scholar
Wei, Z., & Li, H. (2008). A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data. Annals of Applied Statistics, 2(1), 408–429.
Article MathSciNet MATH Google Scholar
Wei, Z., Minturn, J. E., Rappaport, E., Brodeur, G., and Li, H. (2008). Incorporation of genetic pathway information into analysis of multivariate gene expression data. In A. Yakovle, L. Klebanov, & G. Gaile (Eds.), Statistical methods for microarray data analysis. Unpublished manuscript.
Google Scholar
Yuan, M., & Kendziorski, C. (2006). Hidden Markov models for microarray time course data under multiple biological conditions (with discussion). Journal of the American Statistical Association, 101(476), 1323–1340.
Article MathSciNet MATH Google Scholar
Zhu, Y., Pan, W., & Shen, X. (2009). Support vector machines with disease-centric network penalty for high dimensional microarray data. Statistics and its Inference, 2(3), 257–269.
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported in part by NIH grants ES009911 and CA127334.

Author information

Authors and Affiliations

University of Pennsylvania, Philadelphia, USA
Hongzhe Li

Authors

Hongzhe Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongzhe Li .

Editor information

Editors and Affiliations

, Institute of Statistics, National Chiao Tung University, Ta Hsueh Road 1001, Hsinchu, 30050, Taiwan, Taiwan R.O.C.
Henry Horng-Shing Lu
, Department of Empirical Inference, MPI for Intelligent Systems, Spemannstraße 38, Tübingen, 72076, Germany
Bernhard Schölkopf
School of Medicine, Dept. Epidemiology & Public Health, Yale University, College Street 60, New Haven, 06520, Connecticut, USA
Hongyu Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, H. (2011). Hidden Markov Random Field Models for Network-Based Analysis of Genomic Data. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-16345-6_17
Published: 09 April 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16344-9
Online ISBN: 978-3-642-16345-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics