Abstract
We study the edit-distance between two visibly pushdown languages. It is well-known that the edit-distance between two context-free languages is undecidable. The class of visibly pushdown languages is a robust subclass of context-free languages since it is closed under intersection and complementation whereas context-free languages are not. We show that the edit-distance problem is decidable for visibly pushdown languages and present an algorithm for computing the edit-distance based on the construction of an alignment PDA. Moreover, we show that the edit-distance can be computed in polynomial time if we assume that the edit-distance is bounded by a fixed integer k.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alur, R.: Marrying words and trees. In: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 233–242 (2007)
Alur, R., Madhusudan, P.: Visibly pushdown languages. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pp. 202–211 (2004)
Bárány, V., Löding, C., Serre, O.: Regularity problems for visibly pushdown languages. In: Proceedings of the 23rd Annual Symposium on Theoretical Aspects of Computer Science, pp. 420–431 (2006)
Choffrut, C., Pighizzini, G.: Distances between languages and reflexivity of relations. Theor. Comput. Sci. 286(1), 117–138 (2002)
Han, Y.-S., Ko, S.-K., Salomaa, K.: The edit-distance between a regular language and a context-free language. Int. J. Found. Comput. Sci. 24(7), 1067–1082 (2013)
Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (1979)
Kari, L., Konstantinidis, S.: Descriptional complexity of error/edit systems. J. Automata, Lang. Comb. 9, 293–309 (2004)
Leike, J.: VPL intersection emptiness. Bachelor’s Thesis, University of Freiburg (2010)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
Mehlhorn, K.: Pebbling mountain ranges and its application to DCFL-recognition. In: Automata, Languages and Programming, vol. 85, pp. 422–435 (1980)
Mohri, M.: Edit-distance of weighted automata: general definitions and algorithms. Int. J. Found. Comput. Sci. 14(6), 957–982 (2003)
Mozafari, B., Zeng, K., Zaniolo, C.: From regular expressions to nested words: unifying languages and query execution for relational and XML sequences. Proc. VLDB Endowment 3(1–2), 150–161 (2010)
Pevzner, P.A.: Computational Molecular Biology - An Algorithmic Approach. MIT Press, Cambridge (2000)
Thompson, K.: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)
Wood, D.: Theory of Computation. Harper & Row, New York (1987)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Context-free grammar (CFG). A context-free grammar (CFG) G is a four-tuple \(G = (V, \varSigma , R, S)\), where V is a set of variables, \(\varSigma \) is a set of terminals, \(R \subseteq V \times (V\cup \varSigma )^*\) is a finite set of productions and \(S\in V\) is the start variable. Let \(\alpha A \beta \) be a word over \(V \cup \varSigma \), where \(A \in V\) and \(A \rightarrow \gamma \in R\). Then, we say that A can be rewritten as \(\gamma \) and the corresponding derivation step is denoted \(\alpha A \beta \Rightarrow \alpha \gamma \beta \). A production \(A \rightarrow t \in R\) is a terminating production if \(t \in \varSigma ^*\). The reflexive, transitive closure of \(\Rightarrow \) is denoted by and the context-free language generated by G is . We say that a variable \(A \in V\) is nullable if .
Proposition 3 . Let \(L \subseteq \varSigma ^*\) and \(L' \subseteq \varSigma ^*\) be the languages over \(\varSigma \). Then,
holds.
Proof
Assume that \(\mathrm{lsw}(L) = m\) and \(\mathrm{lsw}(L') = n\) where \(n \le m\). It is easy to see that the edit-distance between two shortest words can be at most m since we can substitute all characters of the shortest word of length n with any subsequence of the longer word and insert the remaining characters. \(\square \)
Proposition 5 (Han et al. [5]). Given a PDA \(P = (Q, \varSigma , \varGamma , \delta , s, Z_0, F_P)\), we can obtain a shortest word in L(P) whose length is bounded by \(2^{m^2 l + 1}\) in \(O(n \cdot 2^{m^2 l})\) worst-case time and compute its length in \(O(m^4 nl)\) worst-case time, where \(m = |Q|\), \(n = |\delta |\) and \(l = |\varGamma |\).
Proof
Recall that we can convert a PDA into a CFG by the triple construction [6]. Let us denote the CFG obtained from P by \(G_P\). Then, \(G_P\) has \(|Q|^2\cdot |\varGamma | +1\) variables and \(|Q|^2 \cdot |\delta |\) productions. Moreover, each production of \(G_P\) is of the form \(A \rightarrow \sigma BC, A \rightarrow \sigma B, A \rightarrow \sigma \) or \(A \rightarrow \lambda \), where \(\sigma \in \varSigma \) and \(A,B,C \in V\). Since we want to compute the shortest word from \(G_P\), we can remove the occurrences of all nullable variables from \(G_P\). Then, we pick a variable A that generates the shortest word \(t \in \varSigma ^*\) among all variables and replace its occurrence in \(G_P\) with t. We can compute the shortest word of L(P) by iteratively removing occurrences of such variables. We describe the algorithm in Algorithm 1.
Since a production of \(G_P\) has at most one terminal followed by two variables, the length of the word to be substituted is at most \(2^m -1\) when we replace mth variable. Since we replace at most \(|Q|^2 \cdot |\varGamma |\) variables to have the shortest word, the length of the shortest word in L(P) can be at most \(2^{|Q|^2 \cdot |\varGamma |+1}\). Since there are at most 2|R| occurrences of variables in R and |V| variables, we replace \(\frac{2|R|}{|V|}\) occurrences of a given variable on average. Therefore, the worst-case time complexity for finding a shortest word is \(O(n \cdot 2^{m^2 l})\). We also note that we can compute only the length of the shortest word in \(O(m^4 nl)\) worst-case time by encoding a shortest word to be substituted with a binary number. \(\square \)
Theorem 6 Given two VPAs \(A_i = (\varSigma , \varGamma _i, Q_i, s_i, F_i, \delta _{i,c}, \delta _{i,r}, \delta _{i,l})\) for \(i = 1,2,\) we can compute the edit-distance between \(L(A_1)\) and \(L(A_2)\) in \(O( (m_1 m_2)^5 \cdot n_1n_2 \cdot (l_1l_2)^{10k})\) worst-case time, where \(m_i = |Q_i|, n_i = |\delta _{i,c}| + |\delta _{i,r}| + |\delta _{i,l}|, l_i = |\varGamma _i|\) for \(i =1,2\) and \(k = \max \{ \mathrm{lsw}(L(A_1)), \mathrm{lsw}(L(A_2))\}.\)
Proof
In the proof of Lemma 4, we have shown that we can construct an alignment PDA \(\mathcal{A}(A_1,A_2) = (Q_E ,\varOmega , \varGamma _E, s_E, F_E, \delta _E)\) that accepts all possible alignments between two VPAs \(A_1\) and \(A_2\) of length up to k. From Proposition 5, we can compute the edit-distance in \(O(m^4 n l)\) time, where \(m = |Q_E|\), \(n = |\delta _E|\) and \(l = |\varGamma _E|\). Recall that
and \(l = l_1l_2\). Note that
if \(l_1, l_2 > 0\).
Therefore, the time complexity of computing the edit-distance between two VPAs \(A_1\) and \(A_2\) is
where k is the maximum of the length of the two shortest words from \(L(A_1)\) and \(L(A_2)\). \(\square \)
Corollary 8. Given two VCAs \(A_1,A_2\) and a positive integer \(k \in \mathbb {N}\) in unary such that \(d(L(A_1),L(A_2)) \le k\), we can compute the edit-distance between \(L(A_1)\) and \(L(A_2)\) in polynomial time.
Proof
If \(l_1 = l_2 = 1\),
If we replace \(l_1^{2k}\) and \(l_2^{2k}\) by k from the time complexity, we obtain the time complexity \(O( (m_1 m_2)^5 \cdot n_1 n_2 \cdot k^{10})\) which is polynomial in the size of input.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Han, YS., Ko, SK. (2017). Edit-Distance Between Visibly Pushdown Languages. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds) SOFSEM 2017: Theory and Practice of Computer Science. SOFSEM 2017. Lecture Notes in Computer Science(), vol 10139. Springer, Cham. https://doi.org/10.1007/978-3-319-51963-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-51963-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51962-3
Online ISBN: 978-3-319-51963-0
eBook Packages: Computer ScienceComputer Science (R0)