Skip to main content
Log in

Querying Linguistic Treebanks with Monadic Second-Order Logic in Linear Time

  • Original Article
  • Published:
Journal of Logic, Language and Information Aims and scope Submit manuscript

Abstract

In recent years large amounts of electronic texts have become available. While the first of these corpora had only a low level of annotation, the more recent ones are annotated with refined syntactic information. To make these rich annotations accessible for linguists, the development of query systems has become an important goal. One of the main difficulties in this task consists in the choice of the right query language, a language which at the same time should be powerful enough to let users formulate the queries they want and which should be efficiently evaluable to keep query response times short. There is a widespread belief that such a query language does not exist. It is therefore the aim of this paper to show that there is indeed a powerful query language that can be efficiently evaluated. We propose the use of monadic second-order logic as a query language. We show that a query in this language can be evaluated in linear time in the size of a tree in the corpus. We also provide examples of complicated linguistic queries expressed in monadic second-order logic thereby demonstrating the high expressive power of the language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abeillé, A. and Clément, L., 1999, “A tagged reference corpus for French,” in Proceedings of EACL-LINC.

  • Arnborg, S., Lagergren, J., and Seese, D., 1991, “Easy problems for tree-decomposable graphs,” Journal of Algorithms 12, 308–340.

    Article  Google Scholar 

  • Boag, S., Chamberlin, D., Fernández, M., Florescu, D., Robie, J., and Siméon, J., 2003, “XQuery 1.0: An XML Query Language,” Technical report, W3C. Working draft.

  • Bodlaender, H.L., 1993, “A tourist guide through treewidth,” Acta Cybernetica 11, 1–23.

    Google Scholar 

  • Bodlaender, H.L., 1996, “A linear-time algorithm for finding tree-decompositions of small treewidth,” SIAM Journal on Computing 25, 1305–1317.

    Article  Google Scholar 

  • Brants, S., Dipper, S., Hansen, S., Lezius, W., and Smith, G., 2002, “The TIGER Treebank,” in Proceedings of the Workshop on Treebanks and Linguistic Theories, K. Simov, ed.,Sozopol.

  • Brants, T., Skut, W., and Uszkoreit, H., 1999, “Syntatic annotation of a German newspaper corpus,” pp. 69–76 in Proceedings of the ATALA Treebank Workshop.

  • Cornell, T., 2003, Personal communication.

  • Courcelle, B., 1990a, “Graph rewriting: An algebraic and logic approach,” pp. 193–242 in Handbook of Theoretical Computer Science, Vol. B., Chapt 5, J. van Leeuwen, ed., Elsevier.

  • Courcelle, B., 1990b, “The monadic second-order logic of graphs I: Recognizable sets of finite graphs,” Information and Computation 85, 12–75.

    Article  Google Scholar 

  • Courcelle, B.: 1992, “The mondic second-order logic of graphs III: Tree-decompositions, minors and complexity issues,” Informatique Théoretique et Applications 26, 257–286.

    Google Scholar 

  • Courcelle, B. and Mosbah, M., 1993, “Monadic second-order evaluation on tree-decomposable graphs,” Theoretical Computer Science 109, 49–82.

    Article  Google Scholar 

  • Dickinson, M. and Meurers, D., 2003, “Detecting Errors in Part-of-Speech Annotations,” pp. 107–114 in Proceedings EACL 2003, A. Copestake and J. Hajič, eds.

  • Doner, J., 1970, “Tree acceptors and some of their applications,” Journal of Computer and System Sciences 4, 406–451.

    Article  Google Scholar 

  • Ebbinghaus, H.-D. and Flum, J., 1995, Finite Model Theory, Berlin, New York: Springer-Verlag.

    Google Scholar 

  • Gécseg, F. and Steinby, M., 1984, Tree Automata, Budapest: Akademiai Kiado.

    Google Scholar 

  • Hagerup, T., 2002, “Simpler and faster tree decomposition.” Manuscript, University of Frankfurt a. M.

  • Hinrichs, E., Bartels, J., Kawata, Y., Kordoni, V., and Telljohann, H., 2000, “The VERBMOBIL treebanks,” in Proceedings of KONVENS 2000.

  • Kallmeyer, L. and Steiner, I., 2002, “Querying treebanks of spontaneous speech with VIQTORYA,” Traitement Automatique des Langues 43(3), 155–179.

    Google Scholar 

  • Kay, M., 2001, “XSL Transformations (XSLT), Version 2.0.” Technical Report, W3C.

  • Kepser, S., 2002, “A proof of the turing-completeness of XSLT and XQuery,” Technical Report, SFB 441.

  • Kepser, S., 2003, “Finite structure query: A tool for querying syntactically annotated corpora,” pp. 179–186 in Proceedings EACL 2003, A. Copestake and J. Hajič, eds.

  • König, E. and Lezius, W., 2000, “A description language for syntactically annotated corpora,” pp. 1056–1060 in Proceedings of the COLING Conference.

  • Marcus, M., Santorini, B., and Marcinkiewicz, M. A., 1993, “Building a large annotated corpus of English: The Penn treebank”, Computational Linguistics 19(2), 313–330.

    Google Scholar 

  • Neven, F. and Schwentick, T., 2000, “Expressive and efficient pattern languages for tree-structured data,” in Proceedings PODS 2000, B. Ludäscher, ed.

  • Rabin, M., 1977, “Decidable theories,” pp. 595–629 in Handbook of Mathematical Logic, J. Barwise, ed., North-Holland.

  • Randall, B., 2000, “CorpusSearch user’s manual,” Technical Report, University of Pennsylvania, http://www.ling.upenn.edu/mideng/ppcme2dir/

  • Robertson, N. and Seymour, P., 1986, “Graph minors II. Algorithmic aspects of treewidth,” Journal of Algorithms 7, 309–322.

    Article  Google Scholar 

  • Rogers, J., 2003, Personal communication.

  • Rohde, D., 2001, “TGrep2,” Technical report, Carnegie Mellon University, http://tedlab.mit.edu/~dr/Tgrep2/

  • Thatcher, J. and Wright, J., 1968, “Generalized finite automata theory with an application to a decision problem of second-order logic,” Mathematical Systems Theory 2(1), 57–81.

    Article  Google Scholar 

  • Vardi, M., 1982, “The complexity of relational query languages,” pp. 137–146 in Proceedings of the 14th ACM Symposium on Theory of Computing.

  • W3 Consortium, 1999, “Extensible markup language (XML),” Technical Report, W3C.

  • Wallis, S. and Nelson, G., 2000, “Exploiting fuzzy tree fragment queries in the investigation of parsed corpora,” Literary and Linguistic Computing 15(3), 339–361.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Kepser.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kepser, S. Querying Linguistic Treebanks with Monadic Second-Order Logic in Linear Time. J Logic Lang Inf 13, 457–470 (2004). https://doi.org/10.1007/s10849-004-2116-8

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10849-004-2116-8

Key words

Navigation