Computing the longest common almost-increasing subsequence
Introduction
Given a sequence as input, the longest increasing subsequence (LIS) problem seeks to find a subsequence having the highest possible length, in which the subsequence elements are in sorted order, lowest to highest. This classic problem was first tackled by Robinson [3]. Schensted [25] gave the classical dynamic-programming algorithm for the problem, which appears in many algorithmic textbooks. This algorithm runs in , where n is the length of the input sequence, and k is the length of the longest increasing subsequence (i.e., output size). Knuth [19] gave generalizations to the problem with relations to Young tableaux. For the comparison-tree model, Fredman [12] showed that comparisons are both necessary and sufficient, to find the length or produce the subsequence. Here it assumes the worst case number of comparisons (hence, , making it consistent with [25]). On integer alphabets, the fastest known solution runs in time (see [28] and references therein); it relies on the priority search trees of van Emde Boas [11] that can provide amortized time per operation when keys are drawn from the set . Liben-Nowell et al. [21] have explored the LIS problem in the streaming model which typically aims at reducing the memory space required by the computation to poly-logarithmic amount, in addition to achieving an efficient running time (see also [24]).
A natural extension of the LIS problem is the Longest Common Increasing Subsequence (LCIS) problem [28] that follows from the classic Longest Common Subsequence (LCS) problem. In the LCS problem, given two (or more) input sequences, we are interested to compute a common subsequence of the highest length. LCS [7], [15], [14] and variants thereof (e.g., [27], [6], [15], [13], [16], [18], [17]) have received significant attention in the literature due to their diverse applications in different branches of science and engineering. It has been proven that there is no truly subquadratic algorithm for this problem unless the Strong Exponential Time Hypothesis (SETH) fails [1], [5]. This also holds for approximate solutions to LCS [2]. In fact, it should be noted here that the lower bound for approximating LCS is conditioned on some hypothesis much stronger SETH. Motivated by this result, Bringman et al. studied the multivariate complexity of LCS [4]. They proved the optimal running time for LCS under SETH as , where n and m are the length of the longer and shorter sequence respectively; d is the number of dominant pairs1; and with ℓ being the length of the LCS, are the number of deletions in two sequences. For LCIS as well we seek to find the common subsequence but with the added condition that the computed common subsequence must be an increasing subsequence. Similar to the LCS problem, it has been proven that there is no truly subquadratic algorithm for this problem unless SETH fails [8]. Hence, works have been done on algorithms with output-dependant running time. Kutz et al. [20] provided an algorithm with output-dependant running time of , where n and m are the length of the longer and shorter sequence respectively; ℓ is the length of the LCIS; σ is the size of alphabet; and is the time to sort each sequence.
Yet another variant of LIS, albeit from a different angle, is the Longest Almost-Increasing Subsequence (LAIS) problem. Introduced by Elmasry in [9], this is a relaxed version of the LIS problem: for a given sequence and a constant , the goal here is to construct a longest subsequence of X such that . Now, S here can be thought of as an almost increasing subsequence of X. Elmasry gave an algorithm that runs in time which is asymptotically optimal for the comparison-tree model (as this algorithm can find LIS by setting ). The algorithm uses pointer-based data structures, namely, jump lists and AVL trees.
The LCAIS problem, first introduced and studied in [23], is a natural extension of LAIS and LCIS problem. It takes two sequences and and a constant as input and computes a longest common subsequence of X and Y such that . In fact, in [23], a slightly restricted variant of LCAIS was studied: it was assumed that the given sequences, X and Y, do not have any repeated elements and each element is an integer. They claimed to present an algorithm that runs in time. Note that, LCAIS with the case, actually reduces to the problem of LCIS. This is why here we have rather than .
During the publication and review process of the current paper we came to know of a very recent work of Ta et al. [26], where the authors presented a dynamic programming algorithm that can compute an LCAIS between any two sequences with repeated elements in time and space, where n and m are the lengths of two input sequences and ℓ is the length of the output LCAIS. In fact, as has been reported in [26], Ta et al. first identified a flaw in the algorithm of [23], which motivated them to study this problem and that too in a more general setting than that of [23]. Thus the work of Ta et al. [26] can be seen as the current state of the art for the LCAIS algorithms.
In this paper, we revisit the LCAIS problem and present a number of algorithmic approaches to solve the problem efficiently. In particular, we discuss three different approaches, offering different trade-offs, to solve the problem (Sections 3-5). The first approach (Section 3) provides a straightforward way to solve the problem which acts as a baseline algorithm for the subsequent approaches. The second approach (Section 4) improves upon the baseline algorithm and works best when there are fewer elements within the range, c (which might be true in some practical cases). We present four different implementations of this approach with different degrees of improvements: first we give a naive implementation, the second and third implementations improve upon that and then the fourth implementation tries a different approach to improve upon the naive implementation. The third approach (Section 5) again tries to improve upon the baseline algorithm and works best when the number of matches between two sequences is low; it achieves time complexity when elements are repeated at most a constant number of times. Subsequently, we further discuss how to restore an LCAIS in space (Section 6). Finally, we compare our algorithms with the state of the art, i.e., the work of [26] and discuss some interesting future research avenues (Section 7).
Section snippets
Definitions and terminologies
Definition 1 Prefix For a given sequence , the ith prefix of X is denoted by where .
Note that, all sequences should be assumed to consist of real numbers.
Definition 2 MSeq For two
Approach 1: tracking the maximum
In this section we present our first approach to solve the LCAIS. If we want to append an element to the end of an existing almost-increasing sequence and keep it almost-increasing, we need to check the maximum element of that sequence. We can append the element, if and only if the maximum element is bigger than the appending element by less than c amount. So, for an element, we need to keep track of other elements that are bigger than it by less than c amount. To do it efficiently, we can keep
Approach 2: tracking the last element
In addition to keeping track of the maximum elements, as has been done in the previous section, we now present a different formulation through keeping track of the last elements of the sequences (i.e., index thereof). Here, we will treat the input sequences asymmetrically. We will consider the subsequences that end on a specific index of one input sequence.
Approach 3: track the index of lengths
If we carefully scrutinize Table b of Fig. 1a, we can see some particular patterns as follows (formal proofs follow).
- •
In each column, its values start from zero at the top and going down, the value either increases by one or stays the same. It is very common in problems related to LCS.
- •
When the value goes up, it is either in green or blue cell.
- •
Each blue cell is the maximum of its top and left cell.
Recovering the output sequence
Up to this point we have only discussed how to calculate the length of an LCAIS. To recover an LCAIS, we can simply keep track of the length changes. But it would take cubic space. It is possible to make it more space efficient (quadratic space). We will discuss both a simple and an improved recovery methods for our second approach (Section 4). The methods can be easily extended for our third approach (Section 5). We only show the improved recovery method for this (Section 6.2.6). We do not
Discussions
At this point, a comparative study of our algorithms with the algorithm presented in paper [26] is in order. Note that, none of our algorithms are methodically similar to their algorithm.
However, our first algorithm can be seen as an easier version to their approach, as our algorithm is a direct extension of LCS. Both of the approaches keep track of the length of all LCAISs of all pairs of prefixes of X and Y. Our method keeps the length of the LCAISs based on value of the maximum element in
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (28)
On computing the length of longest increasing subsequences
Discrete Math.
(1975)- et al.
Algorithms for computing variants of the longest common subsequence problem
Theor. Comput. Sci.
(2008) - et al.
New efficient algorithms for the LCS and constrained LCS problems
Inf. Process. Lett.
(2008) Finite automata based algorithms on subsequences and supersequences of degenerate strings
J. Discret. Algorithms
(2010)Faster algorithms for computing longest common increasing subsequences
Selected Papers from the 17th Annual Symposium on Combinatorial Pattern Matching (CPM 2006)
J. Discret. Algorithms
(2011)- et al.
Computing a longest common almost-increasing subsequence of two sequences
Theor. Comput. Sci.
(2021) The constrained longest common subsequence problem
Inf. Process. Lett.
(2003)- et al.
A fast algorithm for computing a longest common increasing subsequence
Inf. Process. Lett.
(2005) - et al.
Tight hardness results for LCS and other sequence similarity measures
- et al.
Fast and deterministic constant factor approximation algorithms for LCS imply new circuit lower bounds