research-article

Quick-MLCS: a new algorithm for the multiple longest common subsequence problem

Authors:
Majid Sazvar

Ferdowsi University of Mashhad, Mashhad, Iran

Ferdowsi University of Mashhad, Mashhad, Iran
View Profile

,
Mahmoud Naghibzadeh

Ferdowsi University of Mashhad, Mashhad, Iran

Ferdowsi University of Mashhad, Mashhad, Iran
View Profile

,
Nayyereh Saadati

Mashhad University of Medical Sciences, Mashhad, Iran

Mashhad University of Medical Sciences, Mashhad, Iran
View Profile

C3S2E '12: Proceedings of the Fifth International C* Conference on Computer Science and Software EngineeringJune 2012Pages 61–66https://doi.org/10.1145/2347583.2347591

Published:27 June 2012Publication History

C3S2E '12: Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering

Pages 61–66

ABSTRACT

Finding the longest common subsequence (LCS) of multiple strings is a well-known problem that has many applications in various fields, such as computational biology and computational genomics. This problem has been studied by a number of researchers and over the years, its complexity has been improved from various aspects. This paper presents a new algorithm for the general case of multiple LCS (MLCS) which is based on one of the fastest existing algorithms. The proposed algorithm is founded on the dominant point approach and uses a linear sorting technique to minimize the dominant points set. The main idea is that, after linearly sorting dominant points, a one-pass linear algorithm can minimize the dominant points set. The results of theoretical and experimental evaluations indicate that the efficiency of the newly proposed algorithm in different scenarios is better than the fastest existing algorithm.

References

Aho, A., Hopcroft, J., Ullman, J. 1983. Data structures and algorithms. Addison-Wesley. Google ScholarDigital Library
Apostolico, A., Browne, S. and Guerra, C. 1992. Fast Linear-Space Computations of Longest Common Subsequences. Theoretical Computer Science 92, 1, 3--17. DOI=10.1016/0304-3975(92)90132-Y http://dx.doi.org/10.1016/0304-3975(92)90132-Y Google ScholarDigital Library
Attwood, T. K. and Findlay, J. B. C. 1994. Fingerprinting G Protein-Coupled Receptors. Protein Eng. 7, 2, 195--203. DOI=10.1093/protein/7.2.195.Google ScholarCross Ref
Bergroth, L., Hakonen, H. and Raita, T. 2000. A Survey of Longest Common Subsequence Algorithms. Proc. Int'l Symp. String Processing Information Retrieval (SPIRE '00), IEEE Computer Society, Washington, DC, USA, 39--48. Google ScholarDigital Library
Bourque, G. and Pevzner, P. A. 2002. Genome-Scale Evolution: Reconstructing Gene Orders in the Ancestral Species. Genome Research 12, 26--36.Google Scholar
Chen, Y., Wan, A. and Liu, W. 2006. A Fast Parallel Algorithm for Finding the Longest Common Sequence of Multiple Biosequences. BMC Bioinformatics 7, S4.Google ScholarCross Ref
Chin, F. Y. and Poon, C. K. 1990. A Fast Algorithm for Computing Longest Common Subsequences of Small Alphabet Size. J. Information Processing 13, 4, 463--469. Google ScholarDigital Library
Dayhoff, M. O. 1969. Computer Analysis of Protein Evolution. Scientific Am. 221, 1, 86--95.Google ScholarCross Ref
Hakata, K. and Imai, H. 1998. Algorithms for the Longest Common Subsequence Problem for Multiple Strings Based on Geometric Maxima. Optimization Methods and Software 10, 233--260.Google ScholarCross Ref
Hirschberg, D. S. 1977. Algorithms for the Longest Common Subsequence Problem. J. ACM 24, 664--675. DOI=10.1145/322033.322044 http://doi.acm.org/10.1145/322033.322044 Google ScholarDigital Library
Hsu, W. J. and Du, M. W. 1984. Computing a Longest Common Subsequence for a Set of Strings. BIT Numerical Math. 24, 1, 45--59.Google ScholarCross Ref
Hunt, J. W. and Szymanski, T. G. 1977. A Fast Algorithm for Computing Longest Common Subsequences. Comm. ACM 20, 5, 350--353. DOI=10.1145/359581.359603 http://doi.acm.org/10.1145/359581.359603 Google ScholarDigital Library
Korkin, D. 2001. A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem. Technical Report TR01-148, Univ. of New Brunswick.Google Scholar
Korkin, D., Wang, Q. and Shang, Y. 2008. An Efficient Parallel Algorithm for the Multiple Longest Common Subsequence (MLCS) Problem. Proc. 37th Int'l Conf. Parallel Processing (ICPP '08), 354--363. Google ScholarDigital Library
Maier, D. 1978. The Complexity of Some Problems on Subsequences and Supersequences. J. ACM 25, 2 (April 1978), 322--336. DOI=10.1145/322063.322075 http://doi.acm.org/10.1145/322063.322075. Google ScholarDigital Library
Masek, W. J. and Paterson, M. S. 1980. A Faster Algorithm Computing String Edit Distances. J. Computer and System Sciences 20, 18--31.Google ScholarCross Ref
Rick, C. 1994. New Algorithms for the Longest Common Subsequence Problem. Technical Report No. 85123-CS, Computer Science Dept., Univ. of Bonn. Google ScholarDigital Library
Sankoff, D. and Blanchette, M. 1999. Phylogenetic Invariants for Genome Rearrangements. J. Computational Biology 6, 431--445.Google ScholarCross Ref
Sankoff, D., Kruskal, J. B. 1983. Time warps, string edits, and macromolecules: the theory and practice of sequence comparison. Addison-Wesley.Google Scholar
Sankoff, D. 1972. Matching Sequences Under Deletion/Insertion Constraints. Proc. Nat'l Academy of Sciences USA 69, 4--6.Google ScholarCross Ref
Smith, T. F. and Waterman, M. S. 1981. Identification of Common Molecular Subsequences. J. Molecular Biology 147, 195--197.Google ScholarCross Ref
Wang, Q., Korkin, D. and Shang, Y. 2011. A Fast Multiple Longest Common Subsequence (MLCS) Algorithm. IEEE Transactions on Knowledge and Data Engineering 23, 3, 321--334. DOI=10.1109/TKDE.2010.123 http://dx.doi.org/10.1109/TKDE.2010.123 Google ScholarDigital Library

Index Terms

Quick-MLCS: a new algorithm for the multiple longest common subsequence problem
1. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
      1. Sorting and searching
  2. Randomness, geometry and discrete structures

Recommendations

Approximability of constrained LCS

The problem Constrained Longest Common Subsequence is a natural extension to the classical problem Longest Common Subsequence, and has important applications to bioinformatics. Given k input sequences A"1,...,A"k and l constraint sequences B"1,...,B"l, ...
Read More
Parallel syntenic alignment on GPUs
BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

We develop novel single-GPU parallel algorithms for syntenic alignment using CUDA. Our algorithms can be used to determine the optimal alignment score as well as the actual optimal alignment for a single pair of sequences. Experimental results show that ...
Read More
A Fast Multiple Longest Common Subsequence (MLCS) Algorithm

Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
C3S2E '12: Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering
June 2012
139 pages
ISBN:9781450310840
DOI:10.1145/2347583
General Chair:
Bipin C. Desai
Concordia University, Canada
,
Program Chairs:
Emil Vassev
University of Limerick, Ireland
,
Sudhir Mudur
Concordia University, Canada
,
Bipin C. Desai
Concordia University, Canada
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DNA sequence
dominant point approach
linear sorting
longest common subsequence
multiple longest common subsequence
protein sequence
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate12of42submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 236
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quick-MLCS: a new algorithm for the multiple longest common subsequence problem

C3S2E '12: Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Approximability of constrained LCS

Parallel syntenic alignment on GPUs

A Fast Multiple Longest Common Subsequence (MLCS) Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Quick-MLCS: a new algorithm for the multiple longest common subsequence problem

C3S2E '12: Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Approximability of constrained LCS

Parallel syntenic alignment on GPUs

A Fast Multiple Longest Common Subsequence (MLCS) Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media