ABSTRACT
Finding the longest common subsequence (LCS) of multiple strings is a well-known problem that has many applications in various fields, such as computational biology and computational genomics. This problem has been studied by a number of researchers and over the years, its complexity has been improved from various aspects. This paper presents a new algorithm for the general case of multiple LCS (MLCS) which is based on one of the fastest existing algorithms. The proposed algorithm is founded on the dominant point approach and uses a linear sorting technique to minimize the dominant points set. The main idea is that, after linearly sorting dominant points, a one-pass linear algorithm can minimize the dominant points set. The results of theoretical and experimental evaluations indicate that the efficiency of the newly proposed algorithm in different scenarios is better than the fastest existing algorithm.
- Aho, A., Hopcroft, J., Ullman, J. 1983. Data structures and algorithms. Addison-Wesley. Google ScholarDigital Library
- Apostolico, A., Browne, S. and Guerra, C. 1992. Fast Linear-Space Computations of Longest Common Subsequences. Theoretical Computer Science 92, 1, 3--17. DOI=10.1016/0304-3975(92)90132-Y http://dx.doi.org/10.1016/0304-3975(92)90132-Y Google ScholarDigital Library
- Attwood, T. K. and Findlay, J. B. C. 1994. Fingerprinting G Protein-Coupled Receptors. Protein Eng. 7, 2, 195--203. DOI=10.1093/protein/7.2.195.Google ScholarCross Ref
- Bergroth, L., Hakonen, H. and Raita, T. 2000. A Survey of Longest Common Subsequence Algorithms. Proc. Int'l Symp. String Processing Information Retrieval (SPIRE '00), IEEE Computer Society, Washington, DC, USA, 39--48. Google ScholarDigital Library
- Bourque, G. and Pevzner, P. A. 2002. Genome-Scale Evolution: Reconstructing Gene Orders in the Ancestral Species. Genome Research 12, 26--36.Google Scholar
- Chen, Y., Wan, A. and Liu, W. 2006. A Fast Parallel Algorithm for Finding the Longest Common Sequence of Multiple Biosequences. BMC Bioinformatics 7, S4.Google ScholarCross Ref
- Chin, F. Y. and Poon, C. K. 1990. A Fast Algorithm for Computing Longest Common Subsequences of Small Alphabet Size. J. Information Processing 13, 4, 463--469. Google ScholarDigital Library
- Dayhoff, M. O. 1969. Computer Analysis of Protein Evolution. Scientific Am. 221, 1, 86--95.Google ScholarCross Ref
- Hakata, K. and Imai, H. 1998. Algorithms for the Longest Common Subsequence Problem for Multiple Strings Based on Geometric Maxima. Optimization Methods and Software 10, 233--260.Google ScholarCross Ref
- Hirschberg, D. S. 1977. Algorithms for the Longest Common Subsequence Problem. J. ACM 24, 664--675. DOI=10.1145/322033.322044 http://doi.acm.org/10.1145/322033.322044 Google ScholarDigital Library
- Hsu, W. J. and Du, M. W. 1984. Computing a Longest Common Subsequence for a Set of Strings. BIT Numerical Math. 24, 1, 45--59.Google ScholarCross Ref
- Hunt, J. W. and Szymanski, T. G. 1977. A Fast Algorithm for Computing Longest Common Subsequences. Comm. ACM 20, 5, 350--353. DOI=10.1145/359581.359603 http://doi.acm.org/10.1145/359581.359603 Google ScholarDigital Library
- Korkin, D. 2001. A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem. Technical Report TR01-148, Univ. of New Brunswick.Google Scholar
- Korkin, D., Wang, Q. and Shang, Y. 2008. An Efficient Parallel Algorithm for the Multiple Longest Common Subsequence (MLCS) Problem. Proc. 37th Int'l Conf. Parallel Processing (ICPP '08), 354--363. Google ScholarDigital Library
- Maier, D. 1978. The Complexity of Some Problems on Subsequences and Supersequences. J. ACM 25, 2 (April 1978), 322--336. DOI=10.1145/322063.322075 http://doi.acm.org/10.1145/322063.322075. Google ScholarDigital Library
- Masek, W. J. and Paterson, M. S. 1980. A Faster Algorithm Computing String Edit Distances. J. Computer and System Sciences 20, 18--31.Google ScholarCross Ref
- Rick, C. 1994. New Algorithms for the Longest Common Subsequence Problem. Technical Report No. 85123-CS, Computer Science Dept., Univ. of Bonn. Google ScholarDigital Library
- Sankoff, D. and Blanchette, M. 1999. Phylogenetic Invariants for Genome Rearrangements. J. Computational Biology 6, 431--445.Google ScholarCross Ref
- Sankoff, D., Kruskal, J. B. 1983. Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley.Google Scholar
- Sankoff, D. 1972. Matching Sequences Under Deletion/Insertion Constraints. Proc. Nat'l Academy of Sciences USA 69, 4--6.Google ScholarCross Ref
- Smith, T. F. and Waterman, M. S. 1981. Identification of Common Molecular Subsequences. J. Molecular Biology 147, 195--197.Google ScholarCross Ref
- Wang, Q., Korkin, D. and Shang, Y. 2011. A Fast Multiple Longest Common Subsequence (MLCS) Algorithm. IEEE Transactions on Knowledge and Data Engineering 23, 3, 321--334. DOI=10.1109/TKDE.2010.123 http://dx.doi.org/10.1109/TKDE.2010.123 Google ScholarDigital Library
Index Terms
- Quick-MLCS: a new algorithm for the multiple longest common subsequence problem
Recommendations
Approximability of constrained LCS
The problem Constrained Longest Common Subsequence is a natural extension to the classical problem Longest Common Subsequence, and has important applications to bioinformatics. Given k input sequences A"1,...,A"k and l constraint sequences B"1,...,B"l, ...
Parallel syntenic alignment on GPUs
BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and BiomedicineWe develop novel single-GPU parallel algorithms for syntenic alignment using CUDA. Our algorithms can be used to determine the optimal alignment score as well as the actual optimal alignment for a single pair of sequences. Experimental results show that ...
A Fast Multiple Longest Common Subsequence (MLCS) Algorithm
Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special ...
Comments