Elsevier

Theoretical Computer Science

Volume 930, 21 September 2022, Pages 157-178
Theoretical Computer Science

Computing the longest common almost-increasing subsequence

https://doi.org/10.1016/j.tcs.2022.07.021Get rights and content

Abstract

In this paper, we revisit the problem of computing longest common almost increasing subsequence (LCAIS) where, given two input sequences, the goal is to compute a common subsequence that is ‘almost’ increasing. Here the concept of an almost increasing subsequence offers an interesting relaxation over the increasing condition. This problem has been studied in the literature albeit with some constraints. Here, we present a number of a number of algorithmic approaches to solve the problem more generally and efficiently.

Introduction

Given a sequence as input, the longest increasing subsequence (LIS) problem seeks to find a subsequence having the highest possible length, in which the subsequence elements are in sorted order, lowest to highest. This classic problem was first tackled by Robinson [3]. Schensted [25] gave the classical dynamic-programming algorithm for the problem, which appears in many algorithmic textbooks. This algorithm runs in O(nlogk), where n is the length of the input sequence, and k is the length of the longest increasing subsequence (i.e., output size). Knuth [19] gave generalizations to the problem with relations to Young tableaux. For the comparison-tree model, Fredman [12] showed that O(nlogn) comparisons are both necessary and sufficient, to find the length or produce the subsequence. Here it assumes the worst case number of comparisons (hence, kn, making it consistent with [25]). On integer alphabets, the fastest known solution runs in O(nloglogn) time (see [28] and references therein); it relies on the priority search trees of van Emde Boas [11] that can provide O(loglogn) amortized time per operation when keys are drawn from the set {1,2,,n}. Liben-Nowell et al. [21] have explored the LIS problem in the streaming model which typically aims at reducing the memory space required by the computation to poly-logarithmic amount, in addition to achieving an efficient running time (see also [24]).

A natural extension of the LIS problem is the Longest Common Increasing Subsequence (LCIS) problem [28] that follows from the classic Longest Common Subsequence (LCS) problem. In the LCS problem, given two (or more) input sequences, we are interested to compute a common subsequence of the highest length. LCS [7], [15], [14] and variants thereof (e.g., [27], [6], [15], [13], [16], [18], [17]) have received significant attention in the literature due to their diverse applications in different branches of science and engineering. It has been proven that there is no truly subquadratic algorithm for this problem unless the Strong Exponential Time Hypothesis (SETH) fails [1], [5]. This also holds for approximate solutions to LCS [2]. In fact, it should be noted here that the lower bound for approximating LCS is conditioned on some hypothesis much stronger SETH. Motivated by this result, Bringman et al. studied the multivariate complexity of LCS [4]. They proved the optimal running time for LCS under SETH as (n+min{d,δΔ,δm})1±o(1), where n and m are the length of the longer and shorter sequence respectively; d is the number of dominant pairs1; and with being the length of the LCS, δ=n,Δ=m are the number of deletions in two sequences. For LCIS as well we seek to find the common subsequence but with the added condition that the computed common subsequence must be an increasing subsequence. Similar to the LCS problem, it has been proven that there is no truly subquadratic algorithm for this problem unless SETH fails [8]. Hence, works have been done on algorithms with output-dependant running time. Kutz et al. [20] provided an algorithm with output-dependant running time of O((n+m)loglogσ+Sort), where n and m are the length of the longer and shorter sequence respectively; is the length of the LCIS; σ is the size of alphabet; and Sort is the time to sort each sequence.

Yet another variant of LIS, albeit from a different angle, is the Longest Almost-Increasing Subsequence (LAIS) problem. Introduced by Elmasry in [9], this is a relaxed version of the LIS problem: for a given sequence X=x1,x2,,xn and a constant c0, the goal here is to construct a longest subsequence S=s1s2s of X such that i[2..]:maxj=1i1sj<si+c. Now, S here can be thought of as an almost increasing subsequence of X. Elmasry gave an algorithm that runs in O(nlog) time which is asymptotically optimal for the comparison-tree model (as this algorithm can find LIS by setting c=0). The algorithm uses pointer-based data structures, namely, jump lists and AVL trees.

The LCAIS problem, first introduced and studied in [23], is a natural extension of LAIS and LCIS problem. It takes two sequences X=x1,x2,,xm and Y=y1,y2,,yn and a constant c>0 as input and computes a longest common subsequence S=s1,s2,,s of X and Y such that i[2..]:maxj=1i1sj<si+c. In fact, in [23], a slightly restricted variant of LCAIS was studied: it was assumed that the given sequences, X and Y, do not have any repeated elements and each element is an integer. They claimed to present an algorithm that runs in O(n(m+c2)) time. Note that, LCAIS with the case, c=0 actually reduces to the problem of LCIS. This is why here we have c>0 rather than c0.

During the publication and review process of the current paper we came to know of a very recent work of Ta et al. [26], where the authors presented a dynamic programming algorithm that can compute an LCAIS between any two sequences with repeated elements in O(nm) time and O(nm) space, where n and m are the lengths of two input sequences and is the length of the output LCAIS. In fact, as has been reported in [26], Ta et al. first identified a flaw in the algorithm of [23], which motivated them to study this problem and that too in a more general setting than that of [23]. Thus the work of Ta et al. [26] can be seen as the current state of the art for the LCAIS algorithms.

In this paper, we revisit the LCAIS problem and present a number of algorithmic approaches to solve the problem efficiently. In particular, we discuss three different approaches, offering different trade-offs, to solve the problem (Sections 3-5). The first approach (Section 3) provides a straightforward way to solve the problem which acts as a baseline algorithm for the subsequent approaches. The second approach (Section 4) improves upon the baseline algorithm and works best when there are fewer elements within the range, c (which might be true in some practical cases). We present four different implementations of this approach with different degrees of improvements: first we give a naive implementation, the second and third implementations improve upon that and then the fourth implementation tries a different approach to improve upon the naive implementation. The third approach (Section 5) again tries to improve upon the baseline algorithm and works best when the number of matches between two sequences is low; it achieves O(mn) time complexity when elements are repeated at most a constant number of times. Subsequently, we further discuss how to restore an LCAIS in O(mn) space (Section 6). Finally, we compare our algorithms with the state of the art, i.e., the work of [26] and discuss some interesting future research avenues (Section 7).

Section snippets

Definitions and terminologies

Definition 1 Prefix

For a given sequence X=x1,x2,,xm, the ith prefix of X is denoted by Xi=x1,x2,,xi where i=0,1,,m.

For example, if X=1,5,1,3,2,4, then X3=1,5,1 and X0 is the empty sequence. Please note that we will be using capital letters to denote sequences and small letters to denote a symbol within a sequence. Therefore, X and Xi would mean a sequence and the ith prefix thereof whereas xi would mean the ith symbol of X.

Note that, all sequences should be assumed to consist of real numbers.

Definition 2 MSeq

For two

Approach 1: tracking the maximum

In this section we present our first approach to solve the LCAIS. If we want to append an element to the end of an existing almost-increasing sequence and keep it almost-increasing, we need to check the maximum element of that sequence. We can append the element, if and only if the maximum element is bigger than the appending element by less than c amount. So, for an element, we need to keep track of other elements that are bigger than it by less than c amount. To do it efficiently, we can keep

Approach 2: tracking the last element

In addition to keeping track of the maximum elements, as has been done in the previous section, we now present a different formulation through keeping track of the last elements of the sequences (i.e., index thereof). Here, we will treat the input sequences asymmetrically. We will consider the subsequences that end on a specific index of one input sequence.

Approach 3: track the index of lengths

If we carefully scrutinize Table b of Fig. 1a, we can see some particular patterns as follows (formal proofs follow).

  • In each column, its values start from zero at the top and going down, the value either increases by one or stays the same. It is very common in problems related to LCS.

  • When the value goes up, it is either in green or blue cell.

  • Each blue cell is the maximum of its top and left cell.

This gives us the idea to follow only the topmost index (i.e., the minimum j index) of a θc value.

Recovering the output sequence

Up to this point we have only discussed how to calculate the length of an LCAIS. To recover an LCAIS, we can simply keep track of the length changes. But it would take cubic space. It is possible to make it more space efficient (quadratic space). We will discuss both a simple and an improved recovery methods for our second approach (Section 4). The methods can be easily extended for our third approach (Section 5). We only show the improved recovery method for this (Section 6.2.6). We do not

Discussions

At this point, a comparative study of our algorithms with the algorithm presented in paper [26] is in order. Note that, none of our algorithms are methodically similar to their algorithm.

However, our first algorithm can be seen as an easier version to their approach, as our algorithm is a direct extension of LCS. Both of the approaches keep track of the length of all LCAISs of all pairs of prefixes of X and Y. Our method keeps the length of the LCAISs based on value of the maximum element in

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (28)

  • G. de et al.

    On the representations of the symmetric group

    Am. J. Math.

    (1938)
  • Karl Bringman et al.

    Multivariate fine-grained complexity of longest common subsequence

  • Karl Bringmann et al.

    Quadratic conditional lower bounds for string problems and dynamic time warping

  • Yi-Ching Chen et al.

    On the generalized constrained longest common subsequence problems

    J. Comb. Optim.

    (Aug. 2009)
  • Cited by (0)

    View full text