A New Method for Finding Approximate Repetitions in DNA Sequences

Wang, Di; Wang, Guoren; Wu, Qingquan; Chen, Baichen; Zhao, Yi

doi:10.1007/11775300_34

Di Wang¹⁹,
Guoren Wang¹⁹,
Qingquan Wu^19,20,
Baichen Chen¹⁹ &
…
Yi Zhao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4016))

Included in the following conference series:

International Conference on Web-Age Information Management

1247 Accesses

Abstract

Searching for approximate repetitions in a DNA sequence has been an important topic in gene analysis. One of the problems in the study is that because of the varying lengths of patterns, the similarity between patterns cannot be judged accurately if we use only the concept of ED ( Edit Distance ). In this paper we shall make effort to define a new function to compute similarity, which considers both the difference and sameness between patterns at the same time. Seeing the computational complexity, we shall also propose two new filter methods based on frequency distance and Pearson correlation, with which we can sort out candidate set of approximate repetitions efficiently. We use SUA instead of sliding window to get the fragments in a DNA sequence, so that the patterns of an approximate repetition have no limitation on length. The results show that with our technique we are able to find a bigger number of approximate repetitions than that of those found with tandem repeat finder.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Sequence Repeats

Fast Algorithm for Vernier Search of Long Repeats in DNA Sequences with Bounded Error Density

Search of Regions with Periodicity Using Random Position Weight Matrices in the Genome of C. elegans

References

David, W.M.: Bioinformatics Sequence and Genome Analysis. Cold Spring Harbor Laborary Press (2001)
Google Scholar
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409(15), 860–921 (2001)
Google Scholar
IBeleza, S., Alves, C., Gonzalez-Neira, A., Lareu, M., Amorim, A., Carracedo, A., Gusmao, L.: Extending STR markers in Y chromosome haplotypes. Int.J.Legal Med. 117(1), 27–33 (2003)
Google Scholar
Young, D.R., Tun, Z., Honda, K., Matoba, R.: Identifying sex chromosome abnormalities in forensic DNA testing using amelogenin and sex chromosome short tandem repeats. J.Forensic Sci. 46(2), 346–348 (2001)
Google Scholar
Moore, C.J., Daly, E.M., Tassone, F., et al.: The effect of pre-mutation of X chromosome CGG trinucleotide repeats on brain anatomy. Brain (October 2004)
Google Scholar
Benson, G.: An algorithm for finding tandem repeats of unspecified pattern size. In: RECOMB 1998, pp. 20–29. ACM Press, New York (1998)
Chapter Google Scholar
Landau, G.M., Schmidt, J.P.: An algorithm for approximate tandem repeats. In: Proc. Of the 4th Annual Symposium on Combinatorial Pattern Matching, Italy, vol. 684, pp. 120–133 (1993)
Google Scholar
Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: REPuter: the manifold applications of repeat analysis on a genomic scale. Nucl. Acids Res. 29(22), 4633–4642 (2001)
Article Google Scholar
Benson, G., Waterman, M.: A method for fast database search for all k-nucleotide repeats. Nucl. Acids Res. 22, 4828–4836 (1994)
Article Google Scholar
Benson, G.: Tandem repeats finder: a program t analyze dna. Nucl. Acids Res. 27(2), 573–580 (1998)
Article Google Scholar
Wexler, Y., Yakhini, Z., Kashi, Y., Geiger, D.: Finding approximate tandem repeats in genomic sequences. In: RECOMB 2004, pp. 223–232. ACM Press, New York (2004)
Chapter Google Scholar
Gusfield, D.: Algorithms on string, trees and sequences. In: Computer science and computational biology, Cambridge University Press, Cambridge (1997)
Google Scholar
Kahveci, T., Singh, A.K.: An efficient index strction of string databases. In: VLDB 2001, pp. 351–360 (2001)
Google Scholar
Wang, D., Wang, G., Wu, Q., Chen, B.: Finding LPRs in DNA sequence based on a new index SUA. In: BIBE 2005, pp. 281–284. IEEE Computer Science, Los Alamitos (2005)
Google Scholar
Wang, D., Wang, G., Chen, B., Wu, Q., Wang, B., Han, D.: A new lightweight index SUA for biological sequence anlysis. J. Huazhong Univ. of Sci. & Tech. 33(12), 207–210 (2005)
Google Scholar
Wang, D., Wang, G., Wu, Q., Chen, B.: Finding approximate repetitions in DNA sequence based on SUA. Technology Report (2005), http://mitt.neu.edu.cn

Download references

Author information

Authors and Affiliations

College of Information Science & Engineering, Northeastern University, Shenyang, 110004, China
Di Wang, Guoren Wang, Qingquan Wu, Baichen Chen & Yi Zhao
Shanghai Baosight Ltd., Shanghai, 201900, China
Qingquan Wu

Authors

Di Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guoren Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qingquan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Baichen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Department of Computing, Hong Kong Polytechnic University, Hong Kong
Hong Va Leong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Wang, G., Wu, Q., Chen, B., Zhao, Y. (2006). A New Method for Finding Approximate Repetitions in DNA Sequences. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_34

Download citation

DOI: https://doi.org/10.1007/11775300_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics