Abstract
The Generalized LR (GLR) parsing algorithm is attractive for use in parsing programming languages because it is asymptotically efficient for typical grammars, and can parse with any context-free grammar, including ambiguous grammars. However, adoption of GLR has been slowed by high constant-factor overheads and the lack of a general, user-defined action interface.
In this paper we present algorithmic and implementation enhancements to GLR to solve these problems. First, we present a hybrid algorithm that chooses between GLR and ordinary LR on a token-by-token basis, thus achieving competitive performance for determinstic input fragments. Second, we describe a design for an action interface and a new worklist algorithm that can guarantee bottom-up execution of actions for acyclic grammars. These ideas are implemented in the Elkhound GLR parser generator.
To demonstrate the effectiveness of these techniques, we describe our experience using Elkhound to write a parser for C++, a language notorious for being difficult to parse. Our C++ parser is small (3500 lines), efficient and maintainable, employing a range of disambiguation strategies.
This research was supported in part by the National Science Foundation Career Grants No. CCR-9875171, No. CCR-0081588, No. CCR-0085949 and No. CCR-0326577, and ITR Grants No. CCR-0085949 and No. CCR-0081588, and gifts from Microsoft Research.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Johnson, S.C.: YACC: Yet another compiler compiler. In: UNIX Programmer’s Manual, 7th edn., vol. 2B (1979)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)
Lang, B.: Deterministic techniques for efficient non-deterministic parsers. In: Loeckx, J. (ed.) ICALP 1974. LNCS, vol. 14, pp. 255–269. Springer, Heidelberg (1974)
Tomita, M.: Efficient Parsing for Natural Language. Int. Series in Engineering and Computer Science. Kluwer, Dordrecht (1985)
Rekers, J.: Parser Generation for Interactive Environments. PhD thesis, University of Amsterdam, Amsterdam, The Netherlands (1992)
Earley, J.: An efficient context-free parsing algorithm. Communications of the ACM 13, 94–102 (1970)
Heering, J., Hendriks, P.R.H., Klint, P., Rekers, J.: The syntax definition formalism SDF - reference manual. SIGPLAN Notices 24, 43–75 (1989)
Donnelly, C., Stallman, R.M.: Bison: the YACC-compatible Parser Generator, Bison Version 1.28. Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139 (1999)
McPeak, S.: Elkhound: A fast, efficient GLR parser generator. Technical Report CSD-02-1214, University of California, Berkeley (2002)
Knuth, D.E.: On the translation of languages from left to right. Information and Control 8, 607–639 (1965)
Nozohoor-Farshi, R.: GLR parsing for ε-grammars. In: Tomita, M. (ed.) Generalized LR Parsing, pp. 61–75. Kluwer, Dordrecht (1991)
International Organization for Standardization: ISO/IEC 14882:1998: Programming languages — C++. International Organization for Standardization, Geneva, Switzerland (1998)
Visser, E.: Scannerless generalized-LR parsing. Technical Report P9707, University of Amsterdam (1997)
Alonso, M.A., Cabrero, D., Vilares, M.: Construction of efficient generalized LR parsers. In: Raymond, D.R., Yu, S., Wood, D. (eds.) WIA 1996. LNCS, vol. 1260. Springer, Heidelberg (1997)
Kipps, J.R.: GLR parsing in time O(n3). In: Tomita, M. (ed.) Generalized LR Parsing, pp. 43–60. Kluwer, Dordrecht (1991)
van den Brand, M., de Jong, H.A., Klint, P., Olivier, P.A.: Efficient annotated terms. Software Practice and Experience 30, 259–291 (2000)
Earley, J.: Ambiguity and precedence in syntax description. Acta Informatica 4, 183–192 (1975)
van den Brand, M., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation filters for scannerless generalized LR parsers. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 143–158. Springer, Heidelberg (2002)
Wagner, T.A., Graham, S.L.: Incremental analysis of real programming languages. In: ACM Programming Language Design and Implementation (PLDI), pp. 31–43 (1997)
Graham, S.L., Harrison, M.A., Ruzzo, W.L.: An improved context-free recognizer. ACM Transactions on Programming Languages and Systems (TOPLAS) 2, 415–462 (1980)
McLean, P., Horspool, R.N.: A faster Earley parser. In: Gyimóthy, T. (ed.) CC 1996. LNCS, vol. 1060, pp. 281–293. Springer, Heidelberg (1996)
Schröer, F.W.: The ACCENT compiler compiler, introduction and reference. Technical Report 101, German National Research Center for Information Technology (2000)
Aycock, J., Horspool, R.N., Janoušek, J., Melichar, B.: Even faster generalized LR parsing. Acta Informatica 37, 633–651 (2001)
Hutton, G.: Higher-order functions for parsing. Journal of Functional Programming 2, 323–343 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McPeak, S., Necula, G.C. (2004). Elkhound: A Fast, Practical GLR Parser Generator. In: Duesterwald, E. (eds) Compiler Construction. CC 2004. Lecture Notes in Computer Science, vol 2985. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24723-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-24723-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21297-3
Online ISBN: 978-3-540-24723-4
eBook Packages: Springer Book Archive