Abstract
In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt 2 l d σ d), where t is the number of input sequences, n is the length of each input sequence, and σ = | ∑ | is the size of the alphabet. The first algorithm that we present in this paper takes \(O(n^{2}t^{2}l^{\frac{d}{2}})\) time and \(O(ntl^{\frac{d}{2}}\sigma^{\frac{d}{2}})\) space, and the second algorithm takes \(O(n^3t^3l^\frac{d}{2}\sigma^{\frac{d}{2}})\) time using \(O(l^\frac{d}{2}\sigma^{\frac{d}{2}})\) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.
This research was partially supported by NSF grant number: ITR-0312724.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Buhler, J., Tompa, B.: Finding motifs using random projections. In: Proc. of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001), pp. 69–76 (2001)
Eskin, E., Keich, U., Gelfand, M.S., Pevzner, P.A.: Genome-wide analysis of bacterial promoter regions. In: Proc. of the Pacific Symposium on Biocomputing PSB − 2003, Kauì, Hawaii, January 3-7 (2003)
Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. In: Proc. of the Tenth International Conference on Intelligent Systems for Molecular Biology (ISMB 2002), Edmonton, Canada, August 3-7 (2002)
Guha Thakurtha, D., Stormo, G.D.: Identifying target sites for cooperatively binding factors. Bioinformatics 15, 563–577 (2001)
Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 10, 1205–1214 (1999)
Liang, S.: cWINNOWER Algorithm for finding fuzzy DNA motifs. In: Proc. of the 2003 IEEE Computational Systems Bioinformatics conference (CSB 2003), pp. 260–265 (2003)
Marsan, L., Sagot, M.: Algorithms for extracting structured motifs using suffix tree with applications to promoter and regulatory site consensus identification. Journal of Computational Biology 7, 345–360 (2000)
Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. In: Proc. of the Ninth International Conference on Intelligent Systems for Molecular Biology (2001)
Pevzner, P.A., Sze, S.: Combinatorial approaches to finding subtle motifs in DNA sequences. In: Proc. of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 269–278 (2000)
Price, A., Ramabhadran, S., Pevzner, A.: Finding subtle motifs by branching from sample strings. Bioinformatics 19, 149–155 (2003)
Sagot, M.: Spelling approximate or repeated motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)
van Helden, J., Rios, A.F., Collado-Vides, J.: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Research 28, 1808–1818 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Satya, R.V., Mukherjee, A. (2004). New Algorithms for Finding Monad Patterns in DNA Sequences. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_40
Download citation
DOI: https://doi.org/10.1007/978-3-540-30213-1_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive