Skip to main content

Stochastic k-testable Tree Languages and Applications

  • Conference paper
  • First Online:
Grammatical Inference: Algorithms and Applications (ICGI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2484))

Included in the following conference series:

  • 336 Accesses

Abstract

In this paper, we describe a generalization for tree stochastic languages of the k-gram models. These models are based on the k-testable class, a subclass of the languages recognizable by ascending tree automata. One of the advantages of this approach is that the probabilistic model can be updated in an incremental fashion. Another feature is that backing-off schemes can be defined. As an illustration of their applicability, they have been used to compress tree data files at a better rate than string-based methods.

Work supported by the Spanish Comisión Interministerial de Ciencia y Tecnología through grant TIC2000-1599-C02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Peter F. Brown, Vincent J. Della Pietra, Peter V. deSouza, Jenifer Lai, and Robert L. Mercer. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467–479, 1992.

    Google Scholar 

  2. Rafael C. Carrasco, Mikel L. Forcada, M. Ángeles Valdés-Muñoz, and Ramón P. Neco. Stable encoding of finite-state machines in discrete-time recurrent neural nets with sigmoid units. Neural Computation, 12(9):2129–2174, 2000.

    Article  Google Scholar 

  3. Eugene Charniak. Statistical Language Learning. MIT Press, 1993.

    Google Scholar 

  4. R. Chaudhuri, S. Pham, and O.N. Garcia. Solution of an open problem on probabilistic grammars. IEEE Transactions on Computers, 32(8):758–750, 1983.

    Article  Google Scholar 

  5. K. L. Chung. Markov Chains with Stationary Transition Probabilities. Springer, Berlin, 2 edition, 1967.

    MATH  Google Scholar 

  6. John G. Cleary and Ian H. Witten. Data compression using adaptive coding and partial string matching. IEEE Transactions on Communicaton, 32(4):396–402, 1984.

    Article  Google Scholar 

  7. Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, New York, NY, USA, 1991.

    MATH  Google Scholar 

  8. Pedro García. Learning k-testable tree sets from positive data. Technical Report DSIC-ii-1993-46, DSIC, Universidad Politécnica de Valencia, 1993.

    Google Scholar 

  9. Pedro García and Enrique Vidal. Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(9):920–925, sep 1990.

    Google Scholar 

  10. Frederick Jelinek. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, Massachusetts, 1998.

    Google Scholar 

  11. T. Knuutila and M. Steinby. The inference of tree languages from finite samples: an algebraic approach. Theoretical Computer Science, 129:337–367, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  12. Timo Knuutila. Inference of k-testable tree languages. In H. Bunke, editor, Advances in Structural and Syntactic Pattern Recognition (Proc. Intl. Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland). World Scientific, aug 1993.

    Google Scholar 

  13. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19:313–330, 1993.

    Google Scholar 

  14. H. Ney, U. Essen, and R. Kneser. On the estimation of small probabilities by leaving-one-out. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17(12):1202–1212, 1995.

    Article  Google Scholar 

  15. Maurice Nivat and Andreas Podelski. Minimal ascending and descending tree automata. SIAM Journal on Computing, 26(1):39–58, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  16. J.R. Rico-Juan, J. Calera-Rubio, and R.C. Carrasco. Stochastic k-testable tree languages and applications. http://www.dlsi.ua.es/~calera/fulltext02.ps.gz, 2002.

  17. G. Rozenberg and A. Salomaa, editors. Handbook of Formal Languages Springer, 1997.

    Google Scholar 

  18. Frank Rubin. Experiments in text file compression. Communications of the ACM, 19(11):617–623, 1976.

    Article  Google Scholar 

  19. Yasubumi Sakakibara. Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97(1):23–60, March 1992.

    Google Scholar 

  20. J.A. Sánchez and J.M. Benedí. Consistency of stochastic context-free grammars from probabilistic estimation based on growth transformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(9):1052–1055, 1997.

    Article  Google Scholar 

  21. Andreas Stolcke. An efficient context-free parsing algorithm that computes prefix probabilities. Computational Linguistics, 21(2): 165–201, 1995.

    MathSciNet  Google Scholar 

  22. I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kauffman Publishing, San Francisco, 2nd edition, 1999.

    Google Scholar 

  23. I. H. Witten, R.M. Neal, and J. G. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30(6):520–540, 1987.

    Article  Google Scholar 

  24. Takashi Yokomori. On polynomial-time learnability in the limit of strictly deterministic automata. Machine Learning, 19(2):153–179, 1995.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rico-Juan, J.R., Calera-Rubio, J., Carrasco, R.C. (2002). Stochastic k-testable Tree Languages and Applications. In: Adriaans, P., Fernau, H., van Zaanen, M. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2002. Lecture Notes in Computer Science(), vol 2484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45790-9_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-45790-9_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44239-4

  • Online ISBN: 978-3-540-45790-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics