Abstract
In recent years large amounts of electronic texts have become available. While the first of these corpora had only a low level of annotation, the more recent ones are annotated with refined syntactic information. To make these rich annotations accessible for linguists, the development of query systems has become an important goal. One of the main difficulties in this task consists in the choice of the right query language, a language which at the same time should be powerful enough to let users formulate the queries they want and which should be efficiently evaluable to keep query response times short. There is a widespread belief that such a query language does not exist. It is therefore the aim of this paper to show that there is indeed a powerful query language that can be efficiently evaluated. We propose the use of monadic second-order logic as a query language. We show that a query in this language can be evaluated in linear time in the size of a tree in the corpus. We also provide examples of complicated linguistic queries expressed in monadic second-order logic thereby demonstrating the high expressive power of the language.
Similar content being viewed by others
References
Abeillé, A. and Clément, L., 1999, “A tagged reference corpus for French,” in Proceedings of EACL-LINC.
Arnborg, S., Lagergren, J., and Seese, D., 1991, “Easy problems for tree-decomposable graphs,” Journal of Algorithms 12, 308–340.
Boag, S., Chamberlin, D., Fernández, M., Florescu, D., Robie, J., and Siméon, J., 2003, “XQuery 1.0: An XML Query Language,” Technical report, W3C. Working draft.
Bodlaender, H.L., 1993, “A tourist guide through treewidth,” Acta Cybernetica 11, 1–23.
Bodlaender, H.L., 1996, “A linear-time algorithm for finding tree-decompositions of small treewidth,” SIAM Journal on Computing 25, 1305–1317.
Brants, S., Dipper, S., Hansen, S., Lezius, W., and Smith, G., 2002, “The TIGER Treebank,” in Proceedings of the Workshop on Treebanks and Linguistic Theories, K. Simov, ed.,Sozopol.
Brants, T., Skut, W., and Uszkoreit, H., 1999, “Syntatic annotation of a German newspaper corpus,” pp. 69–76 in Proceedings of the ATALA Treebank Workshop.
Cornell, T., 2003, Personal communication.
Courcelle, B., 1990a, “Graph rewriting: An algebraic and logic approach,” pp. 193–242 in Handbook of Theoretical Computer Science, Vol. B., Chapt 5, J. van Leeuwen, ed., Elsevier.
Courcelle, B., 1990b, “The monadic second-order logic of graphs I: Recognizable sets of finite graphs,” Information and Computation 85, 12–75.
Courcelle, B.: 1992, “The mondic second-order logic of graphs III: Tree-decompositions, minors and complexity issues,” Informatique Théoretique et Applications 26, 257–286.
Courcelle, B. and Mosbah, M., 1993, “Monadic second-order evaluation on tree-decomposable graphs,” Theoretical Computer Science 109, 49–82.
Dickinson, M. and Meurers, D., 2003, “Detecting Errors in Part-of-Speech Annotations,” pp. 107–114 in Proceedings EACL 2003, A. Copestake and J. Hajič, eds.
Doner, J., 1970, “Tree acceptors and some of their applications,” Journal of Computer and System Sciences 4, 406–451.
Ebbinghaus, H.-D. and Flum, J., 1995, Finite Model Theory, Berlin, New York: Springer-Verlag.
Gécseg, F. and Steinby, M., 1984, Tree Automata, Budapest: Akademiai Kiado.
Hagerup, T., 2002, “Simpler and faster tree decomposition.” Manuscript, University of Frankfurt a. M.
Hinrichs, E., Bartels, J., Kawata, Y., Kordoni, V., and Telljohann, H., 2000, “The VERBMOBIL treebanks,” in Proceedings of KONVENS 2000.
Kallmeyer, L. and Steiner, I., 2002, “Querying treebanks of spontaneous speech with VIQTORYA,” Traitement Automatique des Langues 43(3), 155–179.
Kay, M., 2001, “XSL Transformations (XSLT), Version 2.0.” Technical Report, W3C.
Kepser, S., 2002, “A proof of the turing-completeness of XSLT and XQuery,” Technical Report, SFB 441.
Kepser, S., 2003, “Finite structure query: A tool for querying syntactically annotated corpora,” pp. 179–186 in Proceedings EACL 2003, A. Copestake and J. Hajič, eds.
König, E. and Lezius, W., 2000, “A description language for syntactically annotated corpora,” pp. 1056–1060 in Proceedings of the COLING Conference.
Marcus, M., Santorini, B., and Marcinkiewicz, M. A., 1993, “Building a large annotated corpus of English: The Penn treebank”, Computational Linguistics 19(2), 313–330.
Neven, F. and Schwentick, T., 2000, “Expressive and efficient pattern languages for tree-structured data,” in Proceedings PODS 2000, B. Ludäscher, ed.
Rabin, M., 1977, “Decidable theories,” pp. 595–629 in Handbook of Mathematical Logic, J. Barwise, ed., North-Holland.
Randall, B., 2000, “CorpusSearch user’s manual,” Technical Report, University of Pennsylvania, http://www.ling.upenn.edu/mideng/ppcme2dir/
Robertson, N. and Seymour, P., 1986, “Graph minors II. Algorithmic aspects of treewidth,” Journal of Algorithms 7, 309–322.
Rogers, J., 2003, Personal communication.
Rohde, D., 2001, “TGrep2,” Technical report, Carnegie Mellon University, http://tedlab.mit.edu/~dr/Tgrep2/
Thatcher, J. and Wright, J., 1968, “Generalized finite automata theory with an application to a decision problem of second-order logic,” Mathematical Systems Theory 2(1), 57–81.
Vardi, M., 1982, “The complexity of relational query languages,” pp. 137–146 in Proceedings of the 14th ACM Symposium on Theory of Computing.
W3 Consortium, 1999, “Extensible markup language (XML),” Technical Report, W3C.
Wallis, S. and Nelson, G., 2000, “Exploiting fuzzy tree fragment queries in the investigation of parsed corpora,” Literary and Linguistic Computing 15(3), 339–361.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kepser, S. Querying Linguistic Treebanks with Monadic Second-Order Logic in Linear Time. J Logic Lang Inf 13, 457–470 (2004). https://doi.org/10.1007/s10849-004-2116-8
Issue Date:
DOI: https://doi.org/10.1007/s10849-004-2116-8