Skip to main content
Log in

Constrained pairwise and center-star sequences alignment problems

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

Sequence alignment is a fundamental problem in computational biology, which is also important in theoretical computer science. In this paper, we consider the problem of aligning a set of sequences subject to a given constrained sequence. Given two sequences \(A=a_1a_2\ldots a_n\) and \(B=b_1b_2\ldots b_n\) with a given distance function and a constrained sequence \(C=c_1c_2\ldots c_k\), our goal is to find the optimal sequence alignment of A and B w.r.t. the constraint C. We investigate several variants of this problem. If \(C=c^k\), i.e., all characters in C are same, the optimal constrained pairwise sequence alignment can be solved in \(O(\min \{kn^2,(t-k)n^2\})\) time, where t is the minimum number of occurrences of character c in A and B. If in the final alignment, the alignment score between any two consecutive constrained characters is upper bounded by some value, which is called GB-CPSA, we give a dynamic programming with the time complexity \(O(kn^4/\log n)\). For the constrained center-star sequence alignment (CCSA), we prove that it is NP-hard to achieve the optimal alignment even over the binary alphabet. Furthermore, we show a negative result for CCSA, i.e., there is no polynomial-time algorithm to approximate the CCSA within any constant ratio.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    Article  Google Scholar 

  • Bonizzoni P, Vedova GD (2001) The complexity of multiple sequence alignment with sp-score that is a metric. Theor Comput Sci 259(1–2):63–79

    Article  MathSciNet  MATH  Google Scholar 

  • Chin FYL, Santis AD, Ferrara AL, Ho NL, Kim SK (2004) A simple algorithm for the constrained sequence problems. Inf Process Lett 90:175–179

    Article  MathSciNet  MATH  Google Scholar 

  • Chin FYL, Ho NL, Lam TW, Wong PWH (2005) Efficient constrained multiple sequence alignment with performance guarantee. J Bioinform Comput Biol 3(1):1–18

    Article  Google Scholar 

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, Cambridge

    MATH  Google Scholar 

  • Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman and Company, San Francisco

    MATH  Google Scholar 

  • Gusfield D (1993) Efficient methods for multiple sequence alignment with guaranteed error bounds. Bul Math Biol 55:141–154

    Article  MATH  Google Scholar 

  • Iliopoulos CS, Rahman MS (2008) Algorithms for computing variants of the longest common subsequence problem. Theor Comput Sci 395(2–3):255–267

    Article  MathSciNet  MATH  Google Scholar 

  • Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) ClustalW and ClustalX version 2. Bioinformatics 23(21):2947–2948

    Article  Google Scholar 

  • Masek WJ, Paterson MS (1980) A faster algorithm computing string edit distances. J Comput Syst Sci 20(1):18–31

    Article  MathSciNet  MATH  Google Scholar 

  • Mount DM (2004) Bioinformatics: sequence and genome analysis, 2nd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor

    Google Scholar 

  • Setubal J, Meidanis J (1997) Introduction to computational molecular biology (Chap. 3). PWS Publishing Company, Boston

    Google Scholar 

  • Tang CY, Lu CL, Chang MD-T, Tsai Y-T, Sun Y-J, Chao K-M, Chang J-M, Chiou Y-H, Wu C-M, Chang H-T, Chou W-I (2003) Constrained multiple sequence alignment tool development and its application to rnase family alignment. J Bioinform Comput Biol 1(2):267–287

    Article  Google Scholar 

  • Wang L, Jiang T (1994) On the complexity of multiple sequence alignment. J Comput Biol 1(4):337–348

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank the anonymous referees for their helpful comments to improve the presentation of this paper. This work was supported by NSFC (61433012, U1435215, 11171086), HK RGC Grant (HKU 7114/13E, HKU 7164/12E, HKU 7111/12E), HKU small project funding 201309176064, Natural Science Foundation of Hebei A2013201218, Chinese Academy of Sciences research Grant (No. KGZD-EW-103-5(9)), Fundamental Research Foundation of Northwestern Polytechnical University in China (Grant No. JC201164), Fundamental Research Funds for the Central Universities (Grant No. 3102015ZY081), and China Postdoctoral Science Foundation (Grant No. 2012M521803).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deshi Ye.

Additional information

A preliminary version of this paper appeared in the Proceedings of the 8th International Frontiers of Algorithmics Workshop (FAW 2014) Lecture Notes in Computer Science, Volume 8497, 2014, pp 309–319.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Chan, J.WT., Chin, F.Y.L. et al. Constrained pairwise and center-star sequences alignment problems. J Comb Optim 32, 79–94 (2016). https://doi.org/10.1007/s10878-015-9914-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-015-9914-6

Keywords

Navigation