Skip to main content

Discovering Transcriptional Modules by Combined Analysis of Expression Profiles and Regulatory Sequences

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2010)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6044))

  • 2598 Accesses

Abstract

A key goal of gene expression analysis is the characterization of transcription factors (TFs) and micro-RNAs (miRNAs) regulating specific transcriptional programs. The most common approach to address this task is a two-step methodology: In the first step, a clustering procedure is executed to partition the genes into groups that are believed to be co-regulated, based on expression profile similarity. In the second step, a motif discovery tool is applied to search for over-represented cis-regulatory motifs within each group. In an effort to obtain better results by simultaneously utilizing all available information, several studies have suggested computational schemes for a single-step combined analysis of expression and sequence data. Despite extensive research, reverse engineering complex regulatory networks from microarray measurements remains a difficult challenge with limited success, especially in metazoans.

We present Allegro [1], a new method for de-novo discovery of TF and miRNA binding sites through joint analysis of genome-wide expression data and promoter or 3’ UTR sequences. In brief, Allegro enumerates a huge number of candidate motifs in a series of refinement phases to converge to high-scoring motifs. For each candidate motif, it executes a cross-validation-like procedure to learn an expression model that describes the shared expression profile of the genes, whose cis-regulatory sequence contains the motif. It then computes a p-value for the over-representation of the motif within the genes that best fit the expression profile. The output of Allegro is a non-redundant list of top-scoring motifs and the expression patterns they induce.

The expression model used by Allegro is a novel log likelihood-based, non-parametric model, analogous to the position weight matrix commonly used for representing TF binding sites. Unlike most extant methods, our approach does not assume that the expression values follow a pre-defined type of distribution, and can capture transcriptional modules whose expression profiles differ from the rest of the genome across a small fraction of the conditions. Furthermore, it successfully handles cases where the expression levels are correlated to the length and GC-content of the cis-regulatory sequences. Such correlations are quite common in practice, and often bias existing techniques, leading to false predictions and low sensitivity.

Allegro introduces several additional unique ideas and features, and is implemented in a graphical, user-friendly software tool. We apply it on several large datasets (>100 conditions), in murine, fly and human, report on the transcriptional modules it uncovers, and show that it outperforms extant techniques. Allegro is available at http://acgt.cs.tau.ac.il/allegro.

Supported in part by the Israel Science Foundation (grant 802/08 and Converging Technologies Program grant 1767.07).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Halperin, et al.: Nucleic Acids Research 37(5), 1566–1579 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Halperin, Y., Linhart, C., Ulitsky, I., Shamir, R. (2010). Discovering Transcriptional Modules by Combined Analysis of Expression Profiles and Regulatory Sequences. In: Berger, B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science(), vol 6044. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12683-3_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12683-3_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12682-6

  • Online ISBN: 978-3-642-12683-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics