Skip to main content

Hidden Markov Random Field Models for Network-Based Analysis of Genomic Data

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 4127 Accesses

Abstract

Graphs and networks are common ways of depicting biological information. In biology, many different biological processes are represented by graphs, such as regulatory networks, metabolic pathways and protein-protein interaction networks. This kind of a priori use of graphs is a useful supplement to the standard numerical data such as microarray gene expression data and single nucleotide polymorphisms (SNPs) data. How to incorporate such a prior network information into analysis of numerical data raises interesting statistical problems. Representing the genetic networks as undirected graphs, we have developed several approaches for identifying differentially expressed genes and genes or SNPs associated with diseases in a unified framework of hidden Markov random field (HMRF) models. Different from the traditional empirical Bayes approaches for analysis of gene expression data, the HMRF-based models account for the prior dependency among the genes on the network and therefore effectively utilize the prior network information in identifying the subnetworks of genes that are perturbed by experimental conditions. In this paper, we briefly review the basic setup of the HMRF models and the emission probability functions for some problems often encountered in analysis of microarray gene expression and SNPs data. We also present some interesting areas that require further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aitchison, J., & Dunsmore, I. R. (1975). Statistical prediction analysis. London: Cambridge University Press.

    Book  MATH  Google Scholar 

  2. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36, 192–225.

    MathSciNet  MATH  Google Scholar 

  3. Besag, J. (1986). On the statistical analysis of dirty pictures. Journal of Royal Statistical Society B, 48, 259–302.

    MathSciNet  MATH  Google Scholar 

  4. Bader, G. D., Donaldson, I., Wolting, C., Ouellette, B. F., Pawson, T., & Hogue, C. W. (2001). BIND–The Biomolecular Interaction Network Database. Nucleic Acids Research, 29, 242–245.

    Article  Google Scholar 

  5. Chuang, H. Y., Lee, E., Liu, Y. T., Lee, D., & Ideker, T. (2007). Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3, 140.

    Article  Google Scholar 

  6. Deng, M., Tu, Z., Sun, F., & Chen, T. (2004). Mapping gene ontology to proteins based on proteinprotein interaction data. Bioinformatics, 20, 895–902.

    Article  Google Scholar 

  7. Deng, M., Zhang, K., Mehta, S., Chen, T., & Sun, F. (2003). Prediction of protein function using proteinprotein interaction data. Journal of Computational Biolology, 10, 947–960.

    Article  Google Scholar 

  8. Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, 1151–1160.

    Article  MathSciNet  MATH  Google Scholar 

  9. Hong, F. X., & Li, H. (2006). Functional hierarchical models for identifying genes with different time-course expression profiles. Biometrics, 62, 534–544.

    Article  MathSciNet  MATH  Google Scholar 

  10. Ideker, T., & Sharan, R. (2008). Protein networks in disease. Genome Research, 18, 644–652.

    Article  Google Scholar 

  11. Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., & Speed, T. P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249–264.

    Article  MATH  Google Scholar 

  12. Kanehisa, M., & Goto, S. (2002). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28, 27–30.

    Article  Google Scholar 

  13. Kendziorski, C.M., M.A. Newton, H. Lan, & M.N. Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine, 22, 3899–3914.

    Article  Google Scholar 

  14. Li, C., & Li, H. (2008). Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics, 24, 1175–1182.

    Article  Google Scholar 

  15. Li, C., Wei, Z., & Li, H. (2010). Network-based empirical Bayes methods for linear models with applications to genomic Data. Journal of Pharmaceutical Statistics, 20, 209–222.

    MathSciNet  Google Scholar 

  16. Li, H, Wei, Z., & Maris, J. (2010). A hidden Markov random field model for genome-wide association studies. Biostatistics, 11, 139–150.

    Article  Google Scholar 

  17. Lonnstedt, I., & Speed, T. P. (2002). Replicated microarray data. Statistica Sinica, 12, 31–46.

    MathSciNet  Google Scholar 

  18. Monni, S., & Li, H. (2010). Bayesian analysis for graph-structured genomics data. In M. Chen, D. K. Dey, P. D. Mueller, & Y. Ye (Eds.), Frontier of statistical decision making and bayesian analysis – In honor of James O. Berger.

    Google Scholar 

  19. Newton, M.A., C.M. Kendziorski, C.S. Richmond, F.R. Blattner, & K.W. Tsui (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology, 8, 37–52.

    Article  Google Scholar 

  20. Peri, S., Navarro, J. D., Amanchy, R., Kristiansen, T. Z., Jonnalagadda, C. K., Surendranath, V., Niranjan, V., Muthusamy, B., Gandhi, T. K., Gronborg, M., Ibarrola, N., Deshpande, N., Shanker, K., Shivashankar, H. N., Rashmi, B. P., Ramya, M. A., Zhao, Z., Chandrika, K. N., Padma, N., Harsha, H. C., Yatish, A. J., Kavitha, M. P., Menezes, M., Choudhury, D. R., Suresh, S., Ghosh, N., Saravana, R., Chandran, S., Krishna, S., Joy, M., Anand, S. K., Madavan, V., Joseph, A., Wong, G. W., Schiemann, W. P., Constantinescu, S. N., Huang, L., Khosravi-Far, R., Steen, H., Tewari, M., Ghaffari, S., Blobe, G. C., Dang, C. V., Garcia, J. G., Pevsner, J., Jensen, O. N., Roepstorff, P., Deshpande, K. S., Chinnaiyan, A. M., Hamosh, A., Chakravarti, A., & Pandey, A. (2003). Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Research, 13, 2363–2371.

    Article  Google Scholar 

  21. Sharan, R., Ulitsky, I., & Shamir, R. (2007). Network-based prediction of protein function. Molecular Systems Biology, 3(88).

    Google Scholar 

  22. Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), Article 3.

    Google Scholar 

  23. Sun, W., & Cai, T. (2009). Large-scale multiple testing under dependency. Journal of the Royal Statistical Society, Series B, 71, 393–424.

    Article  MathSciNet  MATH  Google Scholar 

  24. Tai, Y. C., & Speed, T. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. Annals of Statistics, 34, 2387–2412.

    Article  MathSciNet  MATH  Google Scholar 

  25. Ulitsky I., Karp, R. M., & Shamir, R. (2008). Detecting disease-specific dysregulated pathways via analysis of clinical expression profiles. In Proceeding of RECOMB 2008 (pp. 347–359). Berlin: Springer.

    Google Scholar 

  26. Ulitsky, I., & Shamir, R. (2008). Detecting pathways transcriptionally correlated with clinical parameters. Proceedings of the 7th annual international conference on computational systems bioinformatics (CSB 08) (pp. 249–258). London, UK: Imperial College Press.

    Google Scholar 

  27. Ulitsky, I., & Shamir, R. (2009). Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics, 25, 1158–1164.

    Article  Google Scholar 

  28. Wei, P., & Pan, W. (2008). Incorporating gene networks into statistical tests for genomic data via a spatially correlated mixture model. Bioinformatics, 24, 404–411.

    Article  Google Scholar 

  29. Wei, Z., & Li, H. (2007). A Markov random field model for network-based analysis of genomic data. Bioinformatics, 23, 1537–1544.

    Article  MathSciNet  Google Scholar 

  30. Wei, Z., & Li, H. (2008). A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data. Annals of Applied Statistics, 2(1), 408–429.

    Article  MathSciNet  MATH  Google Scholar 

  31. Wei, Z., Minturn, J. E., Rappaport, E., Brodeur, G., and Li, H. (2008). Incorporation of genetic pathway information into analysis of multivariate gene expression data. In A. Yakovle, L. Klebanov, & G. Gaile (Eds.), Statistical methods for microarray data analysis. Unpublished manuscript.

    Google Scholar 

  32. Yuan, M., & Kendziorski, C. (2006). Hidden Markov models for microarray time course data under multiple biological conditions (with discussion). Journal of the American Statistical Association, 101(476), 1323–1340.

    Article  MathSciNet  MATH  Google Scholar 

  33. Zhu, Y., Pan, W., & Shen, X. (2009). Support vector machines with disease-centric network penalty for high dimensional microarray data. Statistics and its Inference, 2(3), 257–269.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported in part by NIH grants ES009911 and CA127334.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhe Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, H. (2011). Hidden Markov Random Field Models for Network-Based Analysis of Genomic Data. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_17

Download citation

Publish with us

Policies and ethics