Skip to main content

Using a Solver Over the String Pattern Domain to Analyze Gene Promoter Sequences

  • Chapter
  • First Online:
Book cover Inductive Databases and Constraint-Based Data Mining

Abstract

This chapter illustrates how inductive querying techniques can be used to support knowledge discovery from genomic data. More precisely, it presents a data mining scenario to discover putative transcription factor binding sites in gene promoter sequences. We do not provide technical details about the used constraintbased data mining algorithms that have been previously described. Our contribution is to provide an abstract description of the scenario, its concrete instantiation and also a typical execution on real data. Our main extraction algorithm is a complete solver dedicated to the string pattern domain: it computes string patterns that satisfy a given conjunction of primitive constraints. We also discuss the processing steps necessary to turn it into a useful tool. In particular, we introduce a parameter tuning strategy, an appropriate measure to rank the patterns, and the post-processing approaches that can be and have been applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Besson, J., Rigotti, C., Mitasiunaité, I., Boulicaut, J.F.: Parameter tuning for differential mining of string patterns. In: Proceedings IEEEWorkshop DDDM’08 co-olocated with ICDM’08, pp. 77–86 (2008)

    Google Scholar 

  2. Boulicaut, J.F., De Raedt, L., Mannila, H. (eds.): Constraint-Based Mining and Inductive Databases, LNCS, vol. 3848. Springer (2005). 400 pages

    Google Scholar 

  3. Brazma, A., Jonassen, I., Vilo, J., Ukkonen, E.: Predicting gene regulatory elements in silico on a genomic scale. Genome Res. 8(11), 1202–1215 (1998)

    Google Scholar 

  4. Bresson, C., Keime, C., Faure, C., Letrillard, Y., Barbado, M., Sanfilippo, S., Benhra, N., Gandrillon, O., Gonin-Giraud, S.: Large-scale analysis by SAGE revealed new mechanisms of v-erba oncogene action. BMC Genomics 8(390) (2007)

    Google Scholar 

  5. Corpet, F.: Multiple sequence alignment with hierarchical clustering. Nucl. Acids Res. 16(22), 10,881–10,890 (1988)

    Article  Google Scholar 

  6. Dan Lee, S., De Raedt, L.: An efficient algorithm for mining string databases under constraints. In: Proceedings KDID’04, pp. 108–129. Springer (2004)

    Google Scholar 

  7. De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2003)

    Google Scholar 

  8. De Raedt, L., Jaeger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering. In: Proceedings IEEE ICDM’02, pp. 123–130 (2002)

    Google Scholar 

  9. Eden, E., Lipson, D., Yogev, S., Yakhini, Z.: Discovering motifs in ranked lists of DNA sequences. PLOS Computational Biology 3(3), 508–522 (2007)

    Article  MathSciNet  Google Scholar 

  10. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. CACM 39(11), 58–64 (1996)

    Google Scholar 

  11. Keich, U., Pevzner, P.A.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18(10), 1382–1390 (2002)

    Article  Google Scholar 

  12. Matys, V., Fricke, E., Geffers, R., G¨ossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., M¨unch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., E., Wingender: Transfac : transcriptional regulation, from patterns to profiles. Nucl. Acids Res. 31(1), 374–378 (2003)

    Google Scholar 

  13. Mitasiunaite, I.: Mining string data under similarity and soft-frequency constraints: Application to promoter sequence analysis. Ph.D. thesis, INSA Lyon (2009)

    Google Scholar 

  14. Mitasiunaite, I., Boulicaut, J.F.: Looking for monotonicity properties of a similarity constraint on sequences. In: Proceedings of ACM SAC’06 Data Mining, pp. 546–552 (2006)

    Google Scholar 

  15. Mitasiunaite, I., Boulicaut, J.F.: Introducing softness into inductive queries on string databases. In: Databases and Information Systems IV, Frontiers in Artificial Intelligence and Applications, vol. 155, pp. 117–132. IOS Press (2007)

    Google Scholar 

  16. Mitasiunaite, I., Rigotti, C., Schicklin, S., Meyniel, L., j. F. Boulicaut, Gandrillon, O.: Extracting signature motifs from promoter sets of differentially expressed genes. In Silico Biology 8(43) (2008)

    Google Scholar 

  17. Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  18. The R Project for Statistical Computing: http://www.r-project.org/

  19. Tompa, M., Li, N., Bailey, T.L., Church, G.M., Moor, B.D., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transciption factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)

    Article  Google Scholar 

  20. Vanet, A., Marsan, L., Sagot, M.F.: Promoter sequences and algorithmical methods for identifying them. Res. Microbiol. 150(9-10), 779–799 (1999)

    Article  Google Scholar 

  21. Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene expression. Science 270(5235), 484–487 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christophe Rigotti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Rigotti, C., Mitašiūnaitė, I., Besson, J., Meyniel, L., Boulicaut, JF., Gandrillon, O. (2010). Using a Solver Over the String Pattern Domain to Analyze Gene Promoter Sequences. In: Džeroski, S., Goethals, B., Panov, P. (eds) Inductive Databases and Constraint-Based Data Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7738-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-7738-0_17

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-7737-3

  • Online ISBN: 978-1-4419-7738-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics