Skip to main content

Edit-Distance Between Visibly Pushdown Languages

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10139))

Abstract

We study the edit-distance between two visibly pushdown languages. It is well-known that the edit-distance between two context-free languages is undecidable. The class of visibly pushdown languages is a robust subclass of context-free languages since it is closed under intersection and complementation whereas context-free languages are not. We show that the edit-distance problem is decidable for visibly pushdown languages and present an algorithm for computing the edit-distance based on the construction of an alignment PDA. Moreover, we show that the edit-distance can be computed in polynomial time if we assume that the edit-distance is bounded by a fixed integer k.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alur, R.: Marrying words and trees. In: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2007, pp. 233–242 (2007)

    Google Scholar 

  2. Alur, R., Madhusudan, P.: Visibly pushdown languages. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing, pp. 202–211 (2004)

    Google Scholar 

  3. Bárány, V., Löding, C., Serre, O.: Regularity problems for visibly pushdown languages. In: Proceedings of the 23rd Annual Symposium on Theoretical Aspects of Computer Science, pp. 420–431 (2006)

    Google Scholar 

  4. Choffrut, C., Pighizzini, G.: Distances between languages and reflexivity of relations. Theor. Comput. Sci. 286(1), 117–138 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  5. Han, Y.-S., Ko, S.-K., Salomaa, K.: The edit-distance between a regular language and a context-free language. Int. J. Found. Comput. Sci. 24(7), 1067–1082 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Hopcroft, J., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (1979)

    MATH  Google Scholar 

  7. Kari, L., Konstantinidis, S.: Descriptional complexity of error/edit systems. J. Automata, Lang. Comb. 9, 293–309 (2004)

    MathSciNet  MATH  Google Scholar 

  8. Leike, J.: VPL intersection emptiness. Bachelor’s Thesis, University of Freiburg (2010)

    Google Scholar 

  9. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  10. Mehlhorn, K.: Pebbling mountain ranges and its application to DCFL-recognition. In: Automata, Languages and Programming, vol. 85, pp. 422–435 (1980)

    Google Scholar 

  11. Mohri, M.: Edit-distance of weighted automata: general definitions and algorithms. Int. J. Found. Comput. Sci. 14(6), 957–982 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. Mozafari, B., Zeng, K., Zaniolo, C.: From regular expressions to nested words: unifying languages and query execution for relational and XML sequences. Proc. VLDB Endowment 3(1–2), 150–161 (2010)

    Article  Google Scholar 

  13. Pevzner, P.A.: Computational Molecular Biology - An Algorithmic Approach. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  14. Thompson, K.: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

    Article  MATH  Google Scholar 

  15. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  16. Wood, D.: Theory of Computation. Harper & Row, New York (1987)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sang-Ki Ko .

Editor information

Editors and Affiliations

Appendix

Appendix

Context-free grammar (CFG). A context-free grammar (CFG) G is a four-tuple \(G = (V, \varSigma , R, S)\), where V is a set of variables, \(\varSigma \) is a set of terminals, \(R \subseteq V \times (V\cup \varSigma )^*\) is a finite set of productions and \(S\in V\) is the start variable. Let \(\alpha A \beta \) be a word over \(V \cup \varSigma \), where \(A \in V\) and \(A \rightarrow \gamma \in R\). Then, we say that A can be rewritten as \(\gamma \) and the corresponding derivation step is denoted \(\alpha A \beta \Rightarrow \alpha \gamma \beta \). A production \(A \rightarrow t \in R\) is a terminating production if \(t \in \varSigma ^*\). The reflexive, transitive closure of \(\Rightarrow \) is denoted by and the context-free language generated by G is . We say that a variable \(A \in V\) is nullable if .

Proposition 3 . Let \(L \subseteq \varSigma ^*\) and \(L' \subseteq \varSigma ^*\) be the languages over \(\varSigma \). Then,

$$d(L,L') \le \max \{ \mathrm{lsw}(L), \mathrm{lsw}(L')\}$$

holds.

Proof

Assume that \(\mathrm{lsw}(L) = m\) and \(\mathrm{lsw}(L') = n\) where \(n \le m\). It is easy to see that the edit-distance between two shortest words can be at most m since we can substitute all characters of the shortest word of length n with any subsequence of the longer word and insert the remaining characters.    \(\square \)

Proposition 5  (Han et al. [5]). Given a PDA \(P = (Q, \varSigma , \varGamma , \delta , s, Z_0, F_P)\), we can obtain a shortest word in L(P) whose length is bounded by \(2^{m^2 l + 1}\) in \(O(n \cdot 2^{m^2 l})\) worst-case time and compute its length in \(O(m^4 nl)\) worst-case time, where \(m = |Q|\), \(n = |\delta |\) and \(l = |\varGamma |\).

Proof

Recall that we can convert a PDA into a CFG by the triple construction [6]. Let us denote the CFG obtained from P by \(G_P\). Then, \(G_P\) has \(|Q|^2\cdot |\varGamma | +1\) variables and \(|Q|^2 \cdot |\delta |\) productions. Moreover, each production of \(G_P\) is of the form \(A \rightarrow \sigma BC, A \rightarrow \sigma B, A \rightarrow \sigma \) or \(A \rightarrow \lambda \), where \(\sigma \in \varSigma \) and \(A,B,C \in V\). Since we want to compute the shortest word from \(G_P\), we can remove the occurrences of all nullable variables from \(G_P\). Then, we pick a variable A that generates the shortest word \(t \in \varSigma ^*\) among all variables and replace its occurrence in \(G_P\) with t. We can compute the shortest word of L(P) by iteratively removing occurrences of such variables. We describe the algorithm in Algorithm 1.

Since a production of \(G_P\) has at most one terminal followed by two variables, the length of the word to be substituted is at most \(2^m -1\) when we replace mth variable. Since we replace at most \(|Q|^2 \cdot |\varGamma |\) variables to have the shortest word, the length of the shortest word in L(P) can be at most \(2^{|Q|^2 \cdot |\varGamma |+1}\). Since there are at most 2|R| occurrences of variables in R and |V| variables, we replace \(\frac{2|R|}{|V|}\) occurrences of a given variable on average. Therefore, the worst-case time complexity for finding a shortest word is \(O(n \cdot 2^{m^2 l})\). We also note that we can compute only the length of the shortest word in \(O(m^4 nl)\) worst-case time by encoding a shortest word to be substituted with a binary number.    \(\square \)

figure c

Theorem 6 Given two VPAs \(A_i = (\varSigma , \varGamma _i, Q_i, s_i, F_i, \delta _{i,c}, \delta _{i,r}, \delta _{i,l})\) for \(i = 1,2,\) we can compute the edit-distance between \(L(A_1)\) and \(L(A_2)\) in \(O( (m_1 m_2)^5 \cdot n_1n_2 \cdot (l_1l_2)^{10k})\) worst-case time, where \(m_i = |Q_i|, n_i = |\delta _{i,c}| + |\delta _{i,r}| + |\delta _{i,l}|, l_i = |\varGamma _i|\) for \(i =1,2\) and \(k = \max \{ \mathrm{lsw}(L(A_1)), \mathrm{lsw}(L(A_2))\}.\)

Proof

In the proof of Lemma 4, we have shown that we can construct an alignment PDA \(\mathcal{A}(A_1,A_2) = (Q_E ,\varOmega , \varGamma _E, s_E, F_E, \delta _E)\) that accepts all possible alignments between two VPAs \(A_1\) and \(A_2\) of length up to k. From Proposition 5, we can compute the edit-distance in \(O(m^4 n l)\) time, where \(m = |Q_E|\), \(n = |\delta _E|\) and \(l = |\varGamma _E|\). Recall that

$$m = m_1m_2 \cdot \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \cdot \Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ), \;\;n = m_1m_2 \cdot n_1n_2 \cdot \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \cdot \Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ),$$

and \(l = l_1l_2\). Note that

$$ \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \in O(l_1^{2k}) \text { and }\Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ) \in O(l_2^{2k}) $$

if \(l_1, l_2 > 0\).

Therefore, the time complexity of computing the edit-distance between two VPAs \(A_1\) and \(A_2\) is

$$ (m_1 m_2)^5 \cdot n_1 n_2 \cdot \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr )^5 \cdot \Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr )^5 \cdot l_1l_2 \in O( (m_1 m_2)^5 \cdot n_1 n_2 \cdot (l_1l_2)^{10k}), $$

where k is the maximum of the length of the two shortest words from \(L(A_1)\) and \(L(A_2)\).    \(\square \)

Corollary 8. Given two VCAs \(A_1,A_2\) and a positive integer \(k \in \mathbb {N}\) in unary such that \(d(L(A_1),L(A_2)) \le k\), we can compute the edit-distance between \(L(A_1)\) and \(L(A_2)\) in polynomial time.

Proof

If \(l_1 = l_2 = 1\),

$$ \Biggl (\sum _{i=0}^{2k} l_1^{i}\Biggr ) \in O(k) \text { and }\Biggl (\sum _{i=0}^{2k} l_2^{i}\Biggr ) \in O(k). $$

If we replace \(l_1^{2k}\) and \(l_2^{2k}\) by k from the time complexity, we obtain the time complexity \(O( (m_1 m_2)^5 \cdot n_1 n_2 \cdot k^{10})\) which is polynomial in the size of input.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Han, YS., Ko, SK. (2017). Edit-Distance Between Visibly Pushdown Languages. In: Steffen, B., Baier, C., van den Brand, M., Eder, J., Hinchey, M., Margaria, T. (eds) SOFSEM 2017: Theory and Practice of Computer Science. SOFSEM 2017. Lecture Notes in Computer Science(), vol 10139. Springer, Cham. https://doi.org/10.1007/978-3-319-51963-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51963-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51962-3

  • Online ISBN: 978-3-319-51963-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics