Pattern Matching Under $$\textrm{DTW}$$ Distance

Gourdel, Garance; Driemel, Anne; Peterlongo, Pierre; Starikovskaya, Tatiana

doi:10.1007/978-3-031-20643-6_23

Garance Gourdel^9,10,
Anne Driemel¹¹,
Pierre Peterlongo¹⁰ &
…
Tatiana Starikovskaya⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13617))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

407 Accesses

Abstract

In this work, we consider the problem of pattern matching under the dynamic time warping ($\textrm{DTW}$) distance motivated by potential applications in the analysis of biological data produced by the third generation sequencing. To measure the $\textrm{DTW}$ distance between two strings, one must “warp” them, that is, double some letters in the strings to obtain two equal-lengths strings, and then sum the distances between the letters in the corresponding positions. When the distances between letters are integers, we show that for a pattern P with m runs and a text T with n runs:

1.
There is an $\mathcal {O}(m+n)$-time algorithm that computes all locations where the $\textrm{DTW}$ distance from P to T is at most 1;
2.
There is an $\mathcal {O}(kmn)$-time algorithm that computes all locations where the $\textrm{DTW}$ distance from P to T is at most k.

As a corollary of the second result, we also derive an approximation algorithm for general metrics on the alphabet.

This work was partially funded by the grants ANR-20-CE48-0001, ANR-19-CE45-0008 SeqDigger and ANR-19-CE48-0016 from the French National Research Agency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The preprocessing time $\mathcal {O}(|\varSigma |^2 \log L)$ that is required to embed $\mu $ into a well-separated metric is not accounted for in the runtime of the algorithm.

References

Abboud, A., Backurs, A., Williams, V.V.: Tight hardness results for LCS and other sequence similarity measures. In: FOCS 2015, pp. 59–78. IEEE Computer Society (2015). https://doi.org/10.1109/FOCS.2015.14
Amarasinghe, S.L., Su, S., Dong, X., Zappia, L., Ritchie, M.E., Gouil, Q.: Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21(1), 1–16 (2020)
Article Google Scholar
Bansal, N., Buchbinder, N., Madry, A., Naor, J.: A polylogarithmic-competitive algorithm for the k-server problem. In: FOCS 2011, pp. 267–276 (2011). https://doi.org/10.1109/FOCS.2011.63
Braverman, V., Charikar, M., Kuszmaul, W., Woodruff, D.P., Yang, L.F.: The one-way communication complexity of dynamic time warping distance. In: SoCG 2019. LIPIcs, vol. 129, pp. 16:1–16:15 (2019). https://doi.org/10.4230/LIPIcs.SoCG.2019.16
Bringmann, K., Künnemann, M.: Quadratic conditional lower bounds for string problems and dynamic time warping. In: FOCS 2015, pp. 79–97 (2015). https://doi.org/10.1109/FOCS.2015.15
Chen, J.Q., Wu, Y., Yang, H., Bergelson, J., Kreitman, M., Tian, D.: Variation in the ratio of nucleotide substitution and indel rates across genomes in mammals and bacteria. Mol. Biol. Evol. 26(7), 1523–1531 (2009). https://doi.org/10.1093/molbev/msp063
Article Google Scholar
Driemel, A., Silvestri, F.: Locality-sensitive hashing of curves. In: SoCG 2017. LIPIcs, vol. 77, pp. 37:1–37:16 (2017). https://doi.org/10.4230/LIPIcs.SoCG.2017.37
Dupont, M., Marteau, P.-F.: Coarse-DTW for sparse time series alignment. In: Douzal-Chouakria, A., Vilar, J.A., Marteau, P.-F. (eds.) AALTD 2015. LNCS (LNAI), vol. 9785, pp. 157–172. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44412-3_11
Chapter Google Scholar
Emiris, I.Z., Psarros, I.: Products of euclidean metrics and applications to proximity questions among curves. In: SoCG 2018. LIPIcs, vol. 99, pp. 37:1–37:13 (2018). https://doi.org/10.4230/LIPIcs.SoCG.2018.37
Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. In: STOC 2003, pp. 448–455 (2003). https://doi.org/10.1145/780542.780608
Froese, V., Jain, B.J., Rymar, M., Weller, M.: Fast exact dynamic time warping on run-length encoded time series. CoRR abs/1903.03003 (2019)
Google Scholar
Gold, O., Sharir, M.: Dynamic time warping and geometric edit distance: breaking the quadratic barrier. ACM Trans. Algorithms 14(4), 50:1–50:17 (2018). https://doi.org/10.1145/3230734
Gonzalez-Garay, M.L.: Introduction to isoform sequencing using pacific biosciences technology (Iso-Seq). In: Wu, J. (ed.) Transcriptomics and Gene Regulation. TRBIO, vol. 9, pp. 141–160. Springer, Dordrecht (2016). https://doi.org/10.1007/978-94-017-7450-5_6
Chapter Google Scholar
Huang, Y.T., Liu, P.Y., Shih, P.W.: Homopolish: a method for the removal of systematic errors in nanopore sequencing by homologous polishing. Genome Biol. 22(1), 95 (2021). https://doi.org/10.1186/s13059-021-02282-6
Article Google Scholar
Hwang, Y., Gelfand, S.B.: Sparse dynamic time warping. In: Perner, P. (ed.) MLDM 2017. LNCS (LNAI), vol. 10358, pp. 163–175. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62416-7_12
Chapter Google Scholar
Hwang, Y., Gelfand, S.B.: Binary sparse dynamic time warping. In: MLDM 2019, pp. 748–759. ibai Publishing (2019)
Google Scholar
Kuszmaul, W.: Dynamic time warping in strongly subquadratic time: algorithms for the low-distance regime and approximate evaluation. In: ICALP 2019. LIPIcs, vol. 132, pp. 80:1–80:15 (2019). https://doi.org/10.4230/LIPIcs.ICALP.2019.80
Kuszmaul, W.: Dynamic time warping in strongly subquadratic time: algorithms for the low-distance regime and approximate evaluation. CoRR abs/1904.09690 (2019). https://doi.org/10.48550/ARXIV.1904.09690
Kuszmaul, W.: Binary dynamic time warping in linear time. CoRR abs/2101.01108 (2021)
Google Scholar
Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27(2), 557–582 (1998). https://doi.org/10.1137/S0097539794264810
Article MathSciNet MATH Google Scholar
Landau, G.M., Vishkin, U.: Fast string matching with k differences. J. Comput. Syst. Sci. 37(1), 63–78 (1988). https://doi.org/10.1016/0022-0000(88)90045-1
Article MathSciNet MATH Google Scholar
Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2018). https://doi.org/10.1093/bioinformatics/bty191
Article Google Scholar
Mahmoud, M., Gobet, N., Cruz-Dávalos, D.I., Mounier, N., Dessimoz, C., Sedlazeck, F.J.: Structural variant calling: the long and the short of it. Genome Biol. 20(1), 1–14 (2019). https://doi.org/10.1186/s13059-019-1828-7
Article Google Scholar
Mueen, A., Chavoshi, N., Abu-El-Rub, N., Hamooni, H., Minnich, A.: AWarp: fast warping distance for sparse time series. In: ICDM 2016, pp. 350–359. IEEE (2016)
Google Scholar
Nishi, A., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Towards efficient interactive computation of dynamic time warping distance. In: Boucher, C., Thankachan, S.V. (eds.) SPIRE 2020. LNCS, vol. 12303, pp. 27–41. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59212-7_3
Chapter Google Scholar
Sakai, Y., Inenaga, S.: A reduction of the dynamic time warping distance to the longest increasing subsequence length. In: ISAAC 2020. LIPIcs, vol. 181, pp. 6:1–6:16 (2020). https://doi.org/10.4230/LIPIcs.ISAAC.2020.6
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

DIENS, École normale supérieure de Paris, PSL Research University, Paris, France
Garance Gourdel & Tatiana Starikovskaya
IRISA Inria Rennes, Rennes, France
Garance Gourdel & Pierre Peterlongo
Hausdorff Center for Mathematics, University of Bonn, Bonn, Germany
Anne Driemel

Authors

Garance Gourdel
View author publications
You can also search for this author in PubMed Google Scholar
Anne Driemel
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Peterlongo
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Starikovskaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Garance Gourdel .

Editor information

Editors and Affiliations

Universidad Técnica Federico Santa María, Valparaíso, Chile
Diego Arroyuelo
Universidad de Chile, Santiago, Chile
Barbara Poblete

Appendices

Appendix A

Lemma 2

Consider a block $B = D[i_p\mathinner {.\,.}j_p, i_t \mathinner {.\,.}j_t]$ and cell (a, b) in it. If $i_p \le a < j_p$, then $D[a,b] \le D[a+1,b]$ and if $i_t \le b < j_t$, then $D[a,b] \le D[a,b+1]$.

Proof

Let us first give an equivalent statement of the lemma: if (a, b) and $(a+1,b)$ are in the same block, then $D[a,b] \le D[a+1,b]$, and if (a, b) and $(a,b+1)$ are in the same block, then $D[a,b] \le D[a,b+1]$.

We show the lemma by induction on $a+b$. The base of the induction are the cells such that $a = 0$ or $b = 0$, and for them the statement holds by the definition of D. Consider now a cell (a, b), where $a,b \ge 1$. Assume that the induction assumption holds for all cells (x, y) such that $x+y < a+b$. By Eq. 1, we have:

$$\begin{aligned}&D[a, b] = \min \{ D[a-1, b-1], D[a-1, b], D[a, b-1]\} +d\\&D[a+1, b] = \min \{ D[a, b-1], D[a, b], D[a+1, b-1]\} + d\\&D[a, b+1] = \min \{ D[a-1, b], D[a-1, b+1], D[a, b]\} + d\\ \end{aligned}$$

Assume that (a, b) and $(a+1,b)$ are in the same block. We have $D[a,b] \le D[a, b-1]+d$ and trivially $D[a,b] \le D[a,b] + d$. By the induction assumption, $D[a,b-1] \le D[a+1,b-1]$ (the cells $(a,b-1)$ and $(a+1,b-1)$ must belong to the same block). Therefore,

$$\begin{aligned} D[a+1,b]&= \min \{ D[a, b-1], D[a, b], D[a+1, b-1]\} + d \\&= \min \{ D[a, b-1] + d, D[a, b] + d, D[a+1, b-1] + d\} \\&\ge \min \{D[a,b], D[a,b], D[a,b-1]+d\} \\&\ge \min \{D[a,b], D[a,b], D[a,b]\} = D[a,b]. \end{aligned}$$

Assume now that (a, b) and $(a,b+1)$ are in the same block. We have $D[a,b] \le D[a-1, b]+d$. Furthermore, as $(a-1,b)$ and $(a-1,b+1)$ are in the same block, we have $D[a-1,b] \le D[a-1,b+1]$ by the induction assumption. Therefore,

$$\begin{aligned} D[a,b+1]&= \min \{ D[a-1, b], D[a-1, b+1], D[a, b]\} + d\\&= \min \{ D[a-1, b] + d, D[a-1, b+1] + d, D[a, b] + d\}\\&\ge \min \{D[a-1,b]+d, D[a-1,b]+d, D[a,b]\}\\&\ge \min \{D[a,b], D[a,b], D[a,b]\} = D[a,b]. \end{aligned}$$

This concludes the proof of the lemma. $\square $

Appendix B

Theorem 2

Given run-length encodings of a pattern P with m runs and of a text T with n runs over an alphabet $\varSigma $. Assume that the $\textrm{DTW}$ distance is specified by a metric $\mu $ on $\varSigma $, and suppose that the ratio between the largest and the smallest non-zero distances between the letters of $\varSigma $ is at most exponential in $L = \max \{|P|,|T|\}$. For any $0< \epsilon < 1$, there is a $\mathcal {O}(L^{1-\varepsilon } \cdot mn \log ^3 L)$-time algorithm that computes $\mathcal {O}(L^{\varepsilon })$-approximation of the smallest $\textrm{DTW}$ distance between P and a substring of T correctly with high probability (See Footnote 1).

Proof

Any metric $\mu $ can be embedded in $\mathcal {O}(\sigma ^2)$ time into a well-separated tree metric $\mu _\tau $ of depth $\mathcal {O}(\log \sigma )$ with expected distortion $\mathcal {O}(\log \sigma )$ (see [10] and [3, Theorem 2.4]). Furthermore, the ratio between the smallest distance and the largest distance grows at most polynomially. Formally, for any two letters a, b we have $\mu (a,b) \le \mu _\tau (a,b)$ and $\mathbb {E}(\mu _\tau (a,b)) \le \mathcal {O}(\log \sigma ) \cdot d(a,b)$. Therefore, we have:

$$\begin{aligned} \textrm{DTW}_{\mu }(X,Y)&\le \textrm{DTW}_{\mu _\tau }(X,Y) \end{aligned}$$

(4)

$$\begin{aligned} \mathbb {E}(\textrm{DTW}_{\mu _\tau }(X,Y))&\le \mathcal {O}(\log \sigma ) \cdot \textrm{DTW}_\mu (X,Y) \end{aligned}$$

(5)

Let $\delta = \min _{S-\text { substr. of }T} \textrm{DTW}_\mu (P,S)$ and $\delta _\tau = \min _{S-\text { substr. of }T} \textrm{DTW}_{\mu _\tau } (P,S)$. Assume that $\delta $ is realised on a substring X, and $\delta _\tau $ on a substring $X_\tau $. By Eq. 4, we then obtain:

$$\delta = \textrm{DTW}_\mu (P,X) \le \textrm{DTW}_\mu (P,X_\tau ) \le \delta _\tau $$

And Eq. 5 gives the following:

$$\mathbb {E}(\delta _\tau ) \le \mathbb {E}(\textrm{DTW}_{\mu _\tau } (P,X)) \le \mathcal {O}(\log \sigma ) \cdot \textrm{DTW}_\mu (P,X) = \mathcal {O}(\log \sigma ) \cdot \delta $$

We apply the embedding $\log L$ times independently to obtain well-separated tree metrics $\mu _\tau ^i$, $i = 1, 2, \ldots , \log L$. From above and by Chernoff bounds,

$$\min _i \min _{S-\text { substring of }T} \textrm{DTW}_{\mu _\tau }^i(P,S)$$

gives an $\mathcal {O}(\log \sigma ) = \mathcal {O}(\log L)$ approximation of $\delta $ with high probability and can be computed in time $\mathcal {O}(L^{1-\varepsilon } \cdot mn \log ^3 L)$ by Lemma 6, concluding the proof of the theorem. $\square $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gourdel, G., Driemel, A., Peterlongo, P., Starikovskaya, T. (2022). Pattern Matching Under $\textrm{DTW}$ Distance. In: Arroyuelo, D., Poblete, B. (eds) String Processing and Information Retrieval. SPIRE 2022. Lecture Notes in Computer Science, vol 13617. Springer, Cham. https://doi.org/10.1007/978-3-031-20643-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-20643-6_23
Published: 01 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20642-9
Online ISBN: 978-3-031-20643-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Pattern Matching Under \(\textrm{DTW}\) Distance

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix A

Lemma 2

Proof

Appendix B

Theorem 2

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Pattern Matching Under \(\textrm{DTW}\) Distance

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendix A

Lemma 2

Proof

Appendix B

Theorem 2

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation