Skip to main content

Mining Discriminative Distance Context of Transcription Factor Binding Sites on ChIP Enriched Regions

  • Conference paper
Bioinformatics Research and Applications (ISBRA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4463))

Included in the following conference series:

  • 842 Accesses

Abstract

Genome-wide identification of transcription factor binding sites (TFBSs) is critical for understanding transcriptional regulation of the gene expression network. ChIP-chip experiments accelerate the procedure of mapping target TFBSs for diverse cellular conditions. We address the problem of discriminating potential TFBSs in ChIP-enriched regions from those of non ChIP-enriched regions using ensemble rule algorithms and a variety of predictive variables, including those based on sequence and chromosomal context. In addition, we developed an input variable based on a scoring scheme that reflects the distance context of surrounding putative TFBSs. Focusing on hepatocyte regulators, this novel feature improved the performance of identifying potential TFBSs, and the measured importance of the predictive variables was consistent with biological meanings. In summary, we found that distance-based features are better discriminators of ChIP-enriched TFBS over other features based on sequence or chromosomal context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yuh, C.H., Bolouri, H., Davidson, E.H.: Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene. Science 279(5358), 1896–1902 (1998)

    Article  Google Scholar 

  2. Bailey, T.L., Gribskov, M.: Score distributions for simultaneous matching to multiple motifs. J. Comput. Biol. 4(1), 45–59 (1997)

    Article  Google Scholar 

  3. Roth, F.P., et al.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16(10), 939–945 (1998)

    Article  Google Scholar 

  4. Kel, A.E., et al.: MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucl. Acids Res. 31(13), 3576–3579 (2003)

    Article  Google Scholar 

  5. Sandelin, A., et al.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32(Database issue), D91–94 (2004)

    Google Scholar 

  6. Smith, A.D., et al.: Mining ChIP-chip data for transcription factor and cofactor binding sites. Bioinformatics, 21(suppl. 1), i403–412 (2005)

    Google Scholar 

  7. Yu, X., et al.: Genome-wide prediction and characterization of interactions between transcription factors in Saccharomyces cerevisiae. Nucleic Acids Res. 34(3), 917–927 (2006)

    Article  Google Scholar 

  8. Yu, X., et al.: Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues. Nucleic Acids Res. 34(17), 4925–4936 (2006)

    Article  Google Scholar 

  9. Jin, V.X., et al.: A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data–A case study using E2F1. Genome Res. 16(12), 1585–1595 (2006)

    Article  Google Scholar 

  10. Macisaac, K.D., et al.: A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data. Bioinformatics 22(4), 423–429 (2006)

    Article  Google Scholar 

  11. Rada-Iglesias, A., et al.: Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum. Mol. Genet. 14(22), 3435–3447 (2005)

    Article  Google Scholar 

  12. Karolchik, D., et al.: The UCSC Genome Browser Database. Nucleic Acids Res. 31(1), 51–54 (2003)

    Article  Google Scholar 

  13. Wasserman, W.W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 5(4), 276–287 (2004)

    Article  Google Scholar 

  14. Silverman, B.W.: Density estimation for statistics and data analysis. Chapman and Hall, London (1986)

    MATH  Google Scholar 

  15. Clifford, S., et al.: Contrasting effects on HIF-1alpha regulation by disease-causing pVHL mutations correlate with patterns of tumourigenesis in von Hippel-Lindau disease. Hum. Mol. Genet. 10(10), 1029–1038 (2001)

    Article  Google Scholar 

  16. Pennacchio, L.A., Rubin, E.M.: Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2(2), 100–109 (2001)

    Article  Google Scholar 

  17. Segal, E., et al.: A genomic code for nucleosome positioning. Nature (2006)

    Google Scholar 

  18. Crawford, G.E., et al.: Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. Proc. Natl. Acad. Sci. U S A 101(4), 992–997 (2004)

    Article  MathSciNet  Google Scholar 

  19. Thomas, J.W., et al.: Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424(6950), 788–793 (2003)

    Article  Google Scholar 

  20. Huber, B.R., Bulyk, M.L.: Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data. BMC Bioinformatics 7, 229 (2006)

    Article  Google Scholar 

  21. Slightom, J.L., et al.: The complete sequences of the galago and rabbit beta-globin locus control regions: extended sequence and functional conservation outside the cores of DNase hypersensitive sites. Genomics 39(1), 90–94 (1997)

    Article  Google Scholar 

  22. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  23. Friedman, J.H., Popescu, B.E.: Predictive Learning viva Rule Ensembles. Department of Statistics, Stanford University (2005)

    Google Scholar 

  24. Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. Statistics department, university of california at berkeley (2004)

    Google Scholar 

  25. Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach. SIGKDD Explor. Newsl. 6(1), 30–39 (2004)

    Article  Google Scholar 

  26. Kwek, S.S., Japkowicz, N., Akbani, R.: Applying Support Vector Machines to Imbalanced Datasets. In: Boulicaut, J.-F., et al. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)

    Google Scholar 

  27. Breiman, L.: Manual on setting up, using, and understanding random forests v3.1 (2002), http://oz.berkeley.edu/users/breiman

  28. Jensen, J.: Gene regulatory factors in pancreatic development. Dev. Dyn. 229(1), 176–200 (2004)

    Article  Google Scholar 

  29. Giese, K., Cox, J., Grosschedl, R.: The HMG domain of lymphoid enhancer factor 1 bends DNA and facilitates assembly of functional nucleoprotein structures. Cell 69(1), 185–195 (1992)

    Article  Google Scholar 

  30. Elnitski, L., et al.: Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques. Genome Res, p. gr.4140006 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ion Măndoiu Alexander Zelikovsky

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, H., Kechris, K.J., Hunter, L. (2007). Mining Discriminative Distance Context of Transcription Factor Binding Sites on ChIP Enriched Regions. In: Măndoiu, I., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2007. Lecture Notes in Computer Science(), vol 4463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72031-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72031-7_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72030-0

  • Online ISBN: 978-3-540-72031-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics