Edit-Distance Between Visibly Pushdown Languages

Han, Yo-Sub; Ko, Sang-Ki

doi:10.1007/978-3-319-51963-0_30

Edit-Distance Between Visibly Pushdown Languages

Yo-Sub Han¹⁹ &
Sang-Ki Ko²⁰

Conference paper
First Online: 11 January 2017

1220 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10139))

Abstract

We study the edit-distance between two visibly pushdown languages. It is well-known that the edit-distance between two context-free languages is undecidable. The class of visibly pushdown languages is a robust subclass of context-free languages since it is closed under intersection and complementation whereas context-free languages are not. We show that the edit-distance problem is decidable for visibly pushdown languages and present an algorithm for computing the edit-distance based on the construction of an alignment PDA. Moreover, we show that the edit-distance can be computed in polynomial time if we assume that the edit-distance is bounded by a fixed integer k.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alur, R.: Marrying words and trees. In: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 233–242 (2007)
Google Scholar
Alur, R., Madhusudan, P.: Visibly pushdown languages. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pp. 202–211 (2004)
Google Scholar
Bárány, V., Löding, C., Serre, O.: Regularity problems for visibly pushdown languages. In: Proceedings of the 23rd Annual Symposium on Theoretical Aspects of Computer Science, pp. 420–431 (2006)
Google Scholar
Choffrut, C., Pighizzini, G.: Distances between languages and reflexivity of relations. Theor. Comput. Sci. 286(1), 117–138 (2002)
Article MathSciNet MATH Google Scholar
Han, Y.-S., Ko, S.-K., Salomaa, K.: The edit-distance between a regular language and a context-free language. Int. J. Found. Comput. Sci. 24(7), 1067–1082 (2013)
Article MathSciNet MATH Google Scholar
Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (1979)
MATH Google Scholar
Kari, L., Konstantinidis, S.: Descriptional complexity of error/edit systems. J. Automata, Lang. Comb. 9, 293–309 (2004)
MathSciNet MATH Google Scholar
Leike, J.: VPL intersection emptiness. Bachelor’s Thesis, University of Freiburg (2010)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet MATH Google Scholar
Mehlhorn, K.: Pebbling mountain ranges and its application to DCFL-recognition. In: Automata, Languages and Programming, vol. 85, pp. 422–435 (1980)
Google Scholar
Mohri, M.: Edit-distance of weighted automata: general definitions and algorithms. Int. J. Found. Comput. Sci. 14(6), 957–982 (2003)
Article MathSciNet MATH Google Scholar
Mozafari, B., Zeng, K., Zaniolo, C.: From regular expressions to nested words: unifying languages and query execution for relational and XML sequences. Proc. VLDB Endowment 3(1–2), 150–161 (2010)
Article Google Scholar
Pevzner, P.A.: Computational Molecular Biology - An Algorithmic Approach. MIT Press, Cambridge (2000)
MATH Google Scholar
Thompson, K.: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Article MATH Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)
Article MathSciNet MATH Google Scholar
Wood, D.: Theory of Computation. Harper & Row, New York (1987)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Yonsei University, 50, Yonsei-Ro, Seodaemun-Gu, Seoul, 120-749, Republic of Korea
Yo-Sub Han
Department of Computer Science, University of Liverpool, Ashton Street, Liverpool, L69 3BX, UK
Sang-Ki Ko

Authors

Yo-Sub Han
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Ki Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sang-Ki Ko .

Editor information

Editors and Affiliations

TU Dortmund , Dortmund, Germany
Bernhard Steffen
TU Dresden , Dresden, Germany
Christel Baier
Eindhoven University of Technology , Eindhoven, The Netherlands
Mark van den Brand
Alpen Adria University Klagenfurt , Klagenfurt, Austria
Johann Eder
Lero - Irish Software Research Center , Limerick, Ireland
Mike Hinchey
Lero - Irish Software Research Center , Limerick, Ireland
Tiziana Margaria

Appendix

Context-free grammar (CFG). A context-free grammar (CFG) G is a four-tuple $G = (V, \varSigma , R, S)$, where V is a set of variables, $\varSigma $ is a set of terminals, $R \subseteq V \times (V\cup \varSigma )^*$ is a finite set of productions and $S\in V$ is the start variable. Let $\alpha A \beta $ be a word over $V \cup \varSigma $, where $A \in V$ and $A \rightarrow \gamma \in R$. Then, we say that A can be rewritten as $\gamma $ and the corresponding derivation step is denoted $\alpha A \beta \Rightarrow \alpha \gamma \beta $. A production $A \rightarrow t \in R$ is a terminating production if $t \in \varSigma ^*$. The reflexive, transitive closure of $\Rightarrow $ is denoted by and the context-free language generated by G is . We say that a variable $A \in V$ is nullable if .

Proposition 3 . Let $L \subseteq \varSigma ^*$ and $L' \subseteq \varSigma ^*$ be the languages over $\varSigma $. Then,

$$d(L,L') \le \max \{ \mathrm{lsw}(L), \mathrm{lsw}(L')\}$$

holds.

Proof

Assume that $\mathrm{lsw}(L) = m$ and $\mathrm{lsw}(L') = n$ where $n \le m$. It is easy to see that the edit-distance between two shortest words can be at most m since we can substitute all characters of the shortest word of length n with any subsequence of the longer word and insert the remaining characters. $\square $

Proposition 5 (Han et al. [5]). Given a PDA $P = (Q, \varSigma , \varGamma , \delta , s, Z_0, F_P)$, we can obtain a shortest word in L(P) whose length is bounded by $2^{m^2 l + 1}$ in $O(n \cdot 2^{m^2 l})$ worst-case time and compute its length in $O(m^4 nl)$ worst-case time, where $m = |Q|$, $n = |\delta |$ and $l = |\varGamma |$.

Proof

Recall that we can convert a PDA into a CFG by the triple construction [6]. Let us denote the CFG obtained from P by $G_P$. Then, $G_P$ has $|Q|^2\cdot |\varGamma | +1$ variables and $|Q|^2 \cdot |\delta |$ productions. Moreover, each production of $G_P$ is of the form $A \rightarrow \sigma BC, A \rightarrow \sigma B, A \rightarrow \sigma $ or $A \rightarrow \lambda $, where $\sigma \in \varSigma $ and $A,B,C \in V$. Since we want to compute the shortest word from $G_P$, we can remove the occurrences of all nullable variables from $G_P$. Then, we pick a variable A that generates the shortest word $t \in \varSigma ^*$ among all variables and replace its occurrence in $G_P$ with t. We can compute the shortest word of L(P) by iteratively removing occurrences of such variables. We describe the algorithm in Algorithm 1.

Since a production of $G_P$ has at most one terminal followed by two variables, the length of the word to be substituted is at most $2^m -1$ when we replace mth variable. Since we replace at most $|Q|^2 \cdot |\varGamma |$ variables to have the shortest word, the length of the shortest word in L(P) can be at most $2^{|Q|^2 \cdot |\varGamma |+1}$. Since there are at most 2|R| occurrences of variables in R and |V| variables, we replace $\frac{2|R|}{|V|}$ occurrences of a given variable on average. Therefore, the worst-case time complexity for finding a shortest word is $O(n \cdot 2^{m^2 l})$. We also note that we can compute only the length of the shortest word in $O(m^4 nl)$ worst-case time by encoding a shortest word to be substituted with a binary number. $\square $

Theorem 6 Given two VPAs $A_i = (\varSigma , \varGamma _i, Q_i, s_i, F_i, \delta _{i,c}, \delta _{i,r}, \delta _{i,l})$ for $i = 1,2,$ we can compute the edit-distance between $L(A_1)$ and $L(A_2)$ in $O( (m_1 m_2)^5 \cdot n_1n_2 \cdot (l_1l_2)^{10k})$ worst-case time, where $m_i = |Q_i|, n_i = |\delta _{i,c}| + |\delta _{i,r}| + |\delta _{i,l}|, l_i = |\varGamma _i|$ for $i =1,2$ and $k = \max \{ \mathrm{lsw}(L(A_1)), \mathrm{lsw}(L(A_2))\}.$

Proof

In the proof of Lemma 4, we have shown that we can construct an alignment PDA $\mathcal{A}(A_1,A_2) = (Q_E ,\varOmega , \varGamma _E, s_E, F_E, \delta _E)$ that accepts all possible alignments between two VPAs $A_1$ and $A_2$ of length up to k. From Proposition 5, we can compute the edit-distance in $O(m^4 n l)$ time, where $m = |Q_E|$, $n = |\delta _E|$ and $l = |\varGamma _E|$. Recall that

$$m = m_1m_2 \cdot \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \cdot \Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ), \;\;n = m_1m_2 \cdot n_1n_2 \cdot \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \cdot \Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ),$$

and $l = l_1l_2$. Note that

$$ \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \in O(l_1^{2k}) \text { and }\Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ) \in O(l_2^{2k}) $$

if $l_1, l_2 > 0$.

Therefore, the time complexity of computing the edit-distance between two VPAs $A_1$ and $A_2$ is

$$ (m_1 m_2)^5 \cdot n_1 n_2 \cdot \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr )^5 \cdot \Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr )^5 \cdot l_1l_2 \in O( (m_1 m_2)^5 \cdot n_1 n_2 \cdot (l_1l_2)^{10k}), $$

where k is the maximum of the length of the two shortest words from $L(A_1)$ and $L(A_2)$. $\square $

Corollary 8. Given two VCAs $A_1,A_2$ and a positive integer $k \in \mathbb {N}$ in unary such that $d(L(A_1),L(A_2)) \le k$, we can compute the edit-distance between $L(A_1)$ and $L(A_2)$ in polynomial time.

Proof

If $l_1 = l_2 = 1$,

$$ \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \in O(k) \text { and }\Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ) \in O(k). $$

If we replace $l_1^{2k}$ and $l_2^{2k}$ by k from the time complexity, we obtain the time complexity $O( (m_1 m_2)^5 \cdot n_1 n_2 \cdot k^{10})$ which is polynomial in the size of input.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, YS., Ko, SK. (2017). Edit-Distance Between Visibly Pushdown Languages. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds) SOFSEM 2017: Theory and Practice of Computer Science. SOFSEM 2017. Lecture Notes in Computer Science(), vol 10139. Springer, Cham. https://doi.org/10.1007/978-3-319-51963-0_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-51963-0_30
Published: 11 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51962-3
Online ISBN: 978-3-319-51963-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Proof

Proof

Proof

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation