Abstract
Functionally related genes often appear in each others neighborhood on the genome, however the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper we address the problem of automatically discovering clusters of entities be it genes or domains: we formalize the abstract problem as a discovery problem called the πpattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E Coli protein sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via parikh mapping. Journal of Discrete Algorithms (2003) (to appear)
Apostolico, A., Iliopoulos, C., Landau, G.M., Schieber, B., Vishkin, U.: Parallel construction of a suffix tree with applications. Algorithmica 3, 347–365 (1988)
Brown, J.W., Clark, G.P., Leader, D.J., Simpson, C.G., Lowe, T.: RNA 7, 1817–1832 (2001)
Dandekar, T., Snel, B., Huynen, M., Bork, P.: Trends Biochem. Sci. 23, 324–328 (1998)
Giglio, S., Broman, K.W., Matsumoto, N., Calvari, V., Gimelli, G., Neuman, T., Obashi, H., Voullaire, L., Larizza, D., Giorda, R., Weber, J.L., Ledbetter, D.H., Zuffardi, O.: Olfactory receptor-gene clusters, genomic-inversion polymorphisms, and common chromosme rearrangements. Am. J. Hum. Genet. 68(4), 874–883 (2001)
Heber, S., Stoye, J.: Finding all common intervals of k permutations. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 207–218. Springer, Heidelberg (2001)
Kihara, D., Kanehisa, M.: Genome Res 10, 731–743 (2000)
Kedem, Z.M., Landau, G.M., Palem, K.V.: Parallel suffix-prefix matching algorithm and application. SIAM Journal of Computing 25(5), 998–1023 (1996)
Karp, R., Miller, R., Rosenberg, A.: Rapid identification of repeated patterns in strngs, arrays and trees. In: Symposium on Theory of Computing, vol. 4, pp. 125–136 (1972)
Lawrence, J.G., Roth, J.R.: Genetics 143, 1843–1860 (1996)
Nakaya, A., Goto, S., Kanehisa, M.: Extraction of corelated gene clusters by mulitple graph comparison. Genome Informatics (12), 44–53 (2001)
Overbeek, R., Fonstein, M., Dsouza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96(6), 2896–2901 (1999)
Ogata, H., Fujibuchi, W., Goto, S.: Nucleic Acids Res 28, 4021–4028 (2000)
Parida, L.: Some results on flexible-pattern matching. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 33–45. Springer, Heidelberg (2000)
Marcott, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions. Science 285, 751–753 (1999)
Snel, B., Lehmann, G., Bork, P., Huynen, M.A.: A web-server to retrieve and display repeatedly occurring neighbourhood of a gene. Nucleic Acids Research 28(18), 3443–3444 (2000)
Siefert, J.L., Martin, K.A., Abdi, F., Widger, W.R., Fox, G.E.: J. Mol. Evol. 45, 467–472 (1997)
Tamames, J., Casari, G., Ouzounis, C., Valencia, A.: J. Mol. Evol. 44, 66–73 (1997)
Tomii, K., Kanehisa, M.: Genome Res 8, 1048–1059 (1998)
Watanbe, H., Mori, H., Itoh, T., Gojobori, T.: J. Mol. Evol. 44, S57–S64 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eres, R., Landau, G.M., Parida, L. (2003). A Combinatorial Approach to Automatic Discovery of Cluster-Patterns. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-39763-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive