Skip to main content

Efficient Algorithms for Regular Expression Constrained Sequence Alignment

  • Conference paper
Combinatorial Pattern Matching (CPM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Abstract

Imposing constraints is an effective means to incorporate biological knowledge into alignment procedures. As in the PROSITE database, functional sites of proteins can be effectively described as regular expressions. In an alignment of protein sequences it is natural to expect that functional motifs should be aligned together. Due to this motivation, in CPM 2005 Arslan introduced the regular expression constrained sequence alignment problem and proposed an algorithm which can take time and space up to O(|Σ|2 |V|4 n 2) and O(|Σ|2 |V|4 n), respectively, where Σ is the alphabet, n is the sequence length, and V is the set of states in an automaton equivalent to the input regular expression. In this paper we propose a more efficient algorithm solving this problem which takes O(|V|3 n 2) time and O(|V|2 n) space in the worst case. If |V|=O(logn) we propose another algorithm with time complexity O(|V|2log|V| n 2). The time complexity of our algorithms is independent of Σ, which is desirable in protein applications where the formulation of this problem originates; a factor of |Σ|2 = 400 in the time complexity of the previously proposed algorithm would significantly affect the efficiency in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jiang, T., Xu, Y., Zhang, M.Q. (eds.): Current Topics in Computational Molecular Biology. MIT Press, Cambridge (2002)

    Google Scholar 

  2. Tang, C.Y., Lu, C.L., Chang, M.D.T., Tsai, Y.T., Sun, Y.J., Chao, K.M., Chang, J.M., Chiou, Y.H., Wu, C.M., Chang, H.T., Chou, W.I.: Constrained sequence alignment tool development and its application to rnase family alignment. Journal of Bioinfomatics and Computational Biology 1, 267–287 (2003)

    Article  Google Scholar 

  3. Chin, F.Y.L., Ho, N.L., Lam, T.W., Wong, P.W.H.: Efficient constrained multiple sequence alignment with performance guarantee. Journal of Bioinformatics and Computational Biology 3(1), 1–18 (2005)

    Article  Google Scholar 

  4. Tsai, Y.T., Huang, Y.P., Yu, C.T., Lu, C.L.: Music: A tool for multiple sequence alignment with constraints. Bioinformatics 20, 2309–2311 (2004)

    Article  Google Scholar 

  5. Lu, C.L., Huang, Y.P.: A memory-efficient algorithm for multiple sequence alignment with constraints. Bioinformatics 21(1), 20–30 (2005)

    Article  MathSciNet  Google Scholar 

  6. Arslan, A.N.: Regular expression constrained sequence alignment. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 322–333. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Hulo, N., Sigrist, C.J.A., Saux, V.L., Langendijk-Genevaux, P.S., Bordoli, L., Gattiker, A., Castro, E.D., Bucher, P., Bairoch, A.: Recent improvements to the prosite database. Nucleic Acids Res. 32, 134–137 (2004)

    Article  Google Scholar 

  8. Faisst, S., Meyer, S.: Compilation of vertebrate-encoded transcription factors. Nucleic Acids Research 20(1), 3–26 (1992)

    Article  Google Scholar 

  9. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)

    MATH  Google Scholar 

  10. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Comm. ACM 18, 341–343 (1975)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chung, YS., Lu, C.L., Tang, C.Y. (2006). Efficient Algorithms for Regular Expression Constrained Sequence Alignment. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_35

Download citation

  • DOI: https://doi.org/10.1007/11780441_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35455-0

  • Online ISBN: 978-3-540-35461-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics