Regular Expression Constrained Sequence Alignment

Arslan, Abdullah N.

doi:10.1007/11496656_28

Abdullah N. Arslan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3537))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

Abstract

Given strings S ₁, S ₂, and a regular expression R, we introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between S ₁ and S ₂ over all alignments such that in these alignments there exists a segment where some substring s ₁ of S ₁ is aligned with some substring s ₂ of S ₂, and both s ₁ and s ₂ match R, i.e. s ₁,s ₂ ∈ L(R) where L(R) is the regular language described by R. A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an O(nmr) time algorithm for the regular expression constrained sequence alignment problem where n, and m are the lengths of S ₁, and S ₂, respectively, and r is in the order of the size of the transition function of a finite automaton M that we create from a nondeterministic finite automaton N accepting L(R). M contains O(t ²) states if N has t states.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Dealing with Regular Expressions

Computing the Shortest String and the Edit-Distance for Parsing Expression Languages

From regular expression matching to parsing

Article 30 March 2022

References

Arslan, N., Eğecioğlu, Ö.: Algorithms for the constrained common sequence problem. In: Simanek, M., Holub, J. (eds.) Proc. Prague Stringology Conference 2004, Prague, August 2004, pp. 24–32 (2004)
Google Scholar
Bork, P., Koonin, E.V.: Protein sequence motifs. Curr. Opin. Struct. Biol. 6, 366–376 (1996)
Article Google Scholar
Chin, F.Y.L., Ho, N.L., Lam, T.W., Wong, P.W.H., Chan, M.Y.: Efficient constrained multiple sequence alignment with performance guarantee. In: Proc. IEEE Computational Systems Bioinformatics (CSB 2003), pp. 337–346 (2003)
Google Scholar
Comet, J.-P., Henry, J.: Pairwise sequence alignment using a PROSITE patternderived similarity score. Computers and Chemistry 26, 421–436 (2002)
Article Google Scholar
Doolittle, R.F.: Similar amino acid sequences: chance or common ancestry. Science 214, 149–159 (1981)
Article Google Scholar
Hopcroft, J.E., Ullman, J.D.: Introduction to automata theory, languages, and computation. Addison-Wesley Publishing Company, Reading (1979)
MATH Google Scholar
Chin, F.Y.L., Santis, A.D., Ferrara, A.L., Ho, N.L., Kim, S.K.: A simple algorithm for the constrained sequence problems. Information Processing Letters 90, 175–179 (2004)
Article MATH MathSciNet Google Scholar
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., Bairoch, A.: The PROSITE database, its status in 2002. Nucleic Acids Res. 30, 235–238 (2002)
Article Google Scholar
Tang, C.Y., Lu, C.L., Chang, M.D.-T., Tsai, Y.-T., Sun, Y.-J., Chao, K.-M., Chang, J.M., Chiou, Y.-H., Wu, C.-M., Chang, H.-T., Chou, W.-I.: Constrained multiple sequence alignment tool development and its applications to rnase family alignment. In: Proceeding of the 1st IEEE Computer Society Bioinformatics Conference (CSB 2002), pp. 127–137 (2002)
Google Scholar
Tsai, Y.-T.: The constrained common sequence problem. Information Processing Letters 88, 173–176 (2003)
Article MATH MathSciNet Google Scholar
Tsai, Y.-T., Lu, C.L., Yu, C.T., Huang, Y.P.: MuSiC: A tool for multiple sequence alignment with constraint. Bioinformatics 20(14), 2309–2311 (2004)
Article Google Scholar
Walker, J.E., Saraste, M., Runswick, M.J., Gay, N.J.: Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATPrequiring enzymes and a common nucleotide binding fold. EMBO J. 1, 945–951 (1982)
Google Scholar
Waterman, M.S.: Introduction to computational biology. Chapman & Hall, Boca Raton (1995)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Vermont, Burlington, VT, 05405, USA
Abdullah N. Arslan

Authors

Abdullah N. Arslan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Université Paris-Est, France
Maxime Crochemore
School of Computer Science and Engineering, Seoul National University, 151-742, Seoul, Korea
Kunsoo Park

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arslan, A.N. (2005). Regular Expression Constrained Sequence Alignment. In: Apostolico, A., Crochemore, M., Park, K. (eds) Combinatorial Pattern Matching. CPM 2005. Lecture Notes in Computer Science, vol 3537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11496656_28

Download citation

DOI: https://doi.org/10.1007/11496656_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26201-5
Online ISBN: 978-3-540-31562-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics