Skip to main content

Computational Analysis of ChIP-chip Data

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 4080 Accesses

Abstract

Chromatin immunoprecipitation coupled with genome tiling array hybridization, also known as ChIP-chip, is a powerful technology to identify protein-DNA interactions in genomes. It is widely used to locate transcription factor binding sites and histone modifications. Data generated by ChIP-chip provide important information on gene regulation. This chapter reviews fundamental issues in ChIP-chip data analysis. Topics include data preprocessing, background correction, normalization, peak detection and motif analysis. Statistical models and principles that significantly improve data analysis are discussed. Popular software tools are briefly introduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bailey, T. L., & Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proceedings of the second international conference on intelligent systems for molecular biology (pp. 28–36). Menlo Park, California, USA: AAAI Press.

    Google Scholar 

  2. Bailey, T. L., & Gribskov, M. (1998). Combining evidence using p-values: Application to sequence homology searches. Bioinformatics, 14, 48–54.

    Article  Google Scholar 

  3. Baldi, P., & Long, A. D. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics, 17, 509–519.

    Article  Google Scholar 

  4. Barrett, T., Troup, D. B., Wilhite, S. E., et al. (2007). NCBI GEO: Mining tens of millions of expression profiles – database and tools update. Nucleic Acids Research, 35(Database issue), D760–765.

    Google Scholar 

  5. Bernstein, B. E., Mikkelsen, T. S., Xie, X., et al. (2006). A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell, 125, 315–326.

    Article  Google Scholar 

  6. Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193.

    Article  Google Scholar 

  7. Boyer, L. A., et al. (2005). Core transcriptional regulatory circuitry in human embryonic stem cells. Cell, 122, 947–956.

    Article  Google Scholar 

  8. Carroll, J. S., et al. (2006). Genome-wide analysis of estrogen receptor binding sites. Nature Genetics, 38, 1289–1297.

    Article  Google Scholar 

  9. Cawley, S., et al. (2004). Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell, 116, 499–509.

    Article  Google Scholar 

  10. Cui, X., Hwang, J. T. G., Qiu, J., et al. (2005). Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics, 6, 59–75.

    Article  MATH  Google Scholar 

  11. Durbin, R., Eddy, S. R., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis – probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  12. Irizarry, R. A., Hobbs, B., Collin, F., et al. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, 249–264.

    Article  MATH  Google Scholar 

  13. Jensen, S. T., Liu, X. S., Zhou, Q., & Liu, J. S. (2004). Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Statistical Science, 19, 188–204.

    Article  MathSciNet  MATH  Google Scholar 

  14. Ji, H., Vokes, S. A., & Wong, W. H. (2006). A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors. Nucleic Acids Research, 34, e146.

    Article  Google Scholar 

  15. Ji, H., Jiang, H., Ma, W., et al. (2008). An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nature Biotechnology, 26, 1293–1300.

    Article  Google Scholar 

  16. Ji, H., & Wong, W. H. (2005). TileMap: Create chromosomal map of tiling array hybridizations. Bioinformatics, 21, 3629–3636.

    Article  Google Scholar 

  17. Ji, X., Li, W., Song, J., Wei, L., & Liu, X. S. (2006). CEAS: cis-regulatory element annotation system. Nucleic Acids Research, 34, W551–554.

    Article  Google Scholar 

  18. Jiang, H., & Wong, W. H. (2008). SeqMap: Mapping massive amount of oligonucleotides to the genome. Bioinformatics, 24, 2395–2396.

    Article  Google Scholar 

  19. Johnson D. S., et al. (2008). Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Research, 18, 393–403.

    Article  Google Scholar 

  20. Johnson, W. E., Li, W., Meyer, C. A., et al. (2006). Model-based analysis of tiling-arrays for ChIP-chip. Proceedings of the National Academy of Sciences of the United States of America, 103, 12457–12462.

    Article  Google Scholar 

  21. Judy, J. T., & Ji, H. (2009). TileProbe: Modeling tiling array probe effects using publicly available data. Bioinformatics, 25, 2369–2375.

    Article  Google Scholar 

  22. Kampa, D., et al. (2004). Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Research, 14, 331–342.

    Article  Google Scholar 

  23. Keles, S., van der Laan, M. J., Dudoit, S., & Cawley, S. E. (2006). Multiple testing methods for ChIP-Chip high density oligonucleotide array data. Journal of Computational Biology, 13, 579–613.

    Article  MathSciNet  Google Scholar 

  24. Li, C., & Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proceedings of the National Academy of Sciences of the United States of America, 98, 31–36.

    Article  MATH  Google Scholar 

  25. Li, W., Carroll, J. S., Brown, M., & Liu, X. S. (2008). xMAN: Extreme MApping of OligoNucleotides. BMC Genomics, 9(Suppl. 1), S20.

    Article  Google Scholar 

  26. Li, W., Meyer, C. A., & Liu, X. S. (2005). A hidden markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding se-quences. Bioinformatics, 21(Suppl. 1), i274–i282.

    Article  Google Scholar 

  27. Li, X. Y., MacArthur, S., & Bourgon, R. (2008). Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biology, 6, e27.

    Article  Google Scholar 

  28. Liu, J. S., Neuwald, A. F., & Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association, 90, 1156–1170.

    Article  MATH  Google Scholar 

  29. Liu, X. S. (2007). Getting started in tiling microarray analysis. PLoS Computational Biology, 3, e183.

    Article  Google Scholar 

  30. Liu, X. S., Brutlag, D. L., & Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology, 20, 835–839.

    Google Scholar 

  31. Qi, Y., et al. (2006). High-resolution computational models of genome binding events. Nature Biotechnology, 24, 963–970.

    Article  Google Scholar 

  32. Ren, B., Robert, F., Wyrick, J. J., et al. (2000). Genome-wide location and function of DNA binding proteins. Science, 290, 2306–2309.

    Article  Google Scholar 

  33. Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26, 1135–1145.

    Article  Google Scholar 

  34. Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, Article 3.

    Google Scholar 

  35. Song, J. S., et al. (2007). Microarray blob-defect removal improves array analysis. Bioinformatics, 23, 966–971.

    Article  Google Scholar 

  36. Vokes, S. A., et al. (2007). Genomic characterization of Gli-activator targets in sonic hedgehog-mediated neural patterning. Development, 134, 1977–1989.

    Article  Google Scholar 

  37. Vokes, S. A., Ji, H., Wong, W. H., & McMahon, A. P. (2008). A genome-scale analysis of the cis-regulatory circuitry underlying sonic hedgehog mediated patterning of the mammalian limb. Genes & Development, 22, 2651–2663.

    Article  Google Scholar 

  38. Wu, Z., Irizarry, R. A., Gentleman, R., et al. (2004). A model based background adjustement for oligonucleotide expression arrays. Journal of the American Statistical Association, 99, 909–917.

    Article  MathSciNet  MATH  Google Scholar 

  39. Zheng, M., Barrera, L. O., Ren, B., Wu, & Y. N. (2007). ChIP-chip: Data, model, and analysis. Biometrics,63, 787–796.

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by the Johns Hopkins Faculty Professional Development Fund to H.J. The author would like to thank Jennifer T. Judy for helpful comments and proofreading the draft of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongkai Ji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ji, H. (2011). Computational Analysis of ChIP-chip Data. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_12

Download citation

Publish with us

Policies and ethics