Skip to main content

A Combinatorial Approach to Automatic Discovery of Cluster-Patterns

  • Conference paper
Algorithms in Bioinformatics (WABI 2003)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Included in the following conference series:

Abstract

Functionally related genes often appear in each others neighborhood on the genome, however the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper we address the problem of automatically discovering clusters of entities be it genes or domains: we formalize the abstract problem as a discovery problem called the πpattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E Coli protein sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via parikh mapping. Journal of Discrete Algorithms (2003) (to appear)

    Google Scholar 

  2. Apostolico, A., Iliopoulos, C., Landau, G.M., Schieber, B., Vishkin, U.: Parallel construction of a suffix tree with applications. Algorithmica 3, 347–365 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  3. Brown, J.W., Clark, G.P., Leader, D.J., Simpson, C.G., Lowe, T.: RNA 7, 1817–1832 (2001)

    Google Scholar 

  4. Dandekar, T., Snel, B., Huynen, M., Bork, P.: Trends Biochem. Sci. 23, 324–328 (1998)

    Google Scholar 

  5. Giglio, S., Broman, K.W., Matsumoto, N., Calvari, V., Gimelli, G., Neuman, T., Obashi, H., Voullaire, L., Larizza, D., Giorda, R., Weber, J.L., Ledbetter, D.H., Zuffardi, O.: Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and common chromosme rearrangements. Am. J. Hum. Genet. 68(4), 874–883 (2001)

    Article  Google Scholar 

  6. Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  7. Kihara, D., Kanehisa, M.: Genome Res 10, 731–743 (2000)

    Article  Google Scholar 

  8. Kedem, Z.M., Landau, G.M., Palem, K.V.: Parallel suffix-prefix matching algorithm and application. SIAM Journal of Computing 25(5), 998–1023 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  9. Karp, R., Miller, R., Rosenberg, A.: Rapid identification of repeated patterns in strngs, arrays and trees. In: Symposium on Theory of Computing, vol. 4, pp. 125–136 (1972)

    Google Scholar 

  10. Lawrence, J.G., Roth, J.R.: Genetics 143, 1843–1860 (1996)

    Google Scholar 

  11. Nakaya, A., Goto, S., Kanehisa, M.: Extraction of corelated gene clusters by mulitple graph comparison. Genome Informatics (12), 44–53 (2001)

    Google Scholar 

  12. Overbeek, R., Fonstein, M., Dsouza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96(6), 2896–2901 (1999)

    Article  Google Scholar 

  13. Ogata, H., Fujibuchi, W., Goto, S.: Nucleic Acids Res 28, 4021–4028 (2000)

    Article  Google Scholar 

  14. Parida, L.: Some results on flexible-pattern matching. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 33–45. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  15. Marcott, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions. Science 285, 751–753 (1999)

    Article  Google Scholar 

  16. Snel, B., Lehmann, G., Bork, P., Huynen, M.A.: A web-server to retrieve and display repeatedly occurring neighbourhood of a gene. Nucleic Acids Research 28(18), 3443–3444 (2000)

    Article  Google Scholar 

  17. Siefert, J.L., Martin, K.A., Abdi, F., Widger, W.R., Fox, G.E.: J. Mol. Evol. 45, 467–472 (1997)

    Article  Google Scholar 

  18. Tamames, J., Casari, G., Ouzounis, C., Valencia, A.: J. Mol. Evol. 44, 66–73 (1997)

    Article  Google Scholar 

  19. Tomii, K., Kanehisa, M.: Genome Res 8, 1048–1059 (1998)

    Google Scholar 

  20. Watanbe, H., Mori, H., Itoh, T., Gojobori, T.: J. Mol. Evol. 44, S57–S64 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Eres, R., Landau, G.M., Parida, L. (2003). A Combinatorial Approach to Automatic Discovery of Cluster-Patterns. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39763-2_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20076-5

  • Online ISBN: 978-3-540-39763-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics