Abstract
Modern programming languages use regular expressions to define valid tokens. Traditional lexical analyzers based on minimum deterministic finite automata for regular expressions cannot handle the look-ahead problem. The scanner writer needs to explicitly identify the look-ahead states and code the buffering and re-scanning operations by hand. We identify the class of finite look-ahead finite automata, which is general enough to include all finite automata of practical lexical analyzers. Finite look-ahead finite automata are then transformed into suffix finite automata. A new lexical analyzer makes use of the suffix finite automata to identify tokens. The new lexical analyzer solves the look-ahead problem in a table-driven approach and it can detect lexical errors at an earlier time than traditional lexical analyzers. The extra cost of the new lexical analyzers is the larger state transition table and three additional 1-dimensional tables. Incremental lexical analysis is also discussed.
Similar content being viewed by others
References
Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Reading, MA: Addison-Wesley 1974
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM18(6), 333–340 (1975)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley 1986
Beetem, J.F., Beetem, A.F.: Incremental scanning and parsing with Galaxy. IEEE Trans. Software Engineering17(7), 641–651 (1991)
Fischer, B., Hammer, C., Struckmann, W.: ALADIN: A scanner generator for incremental programming environments. Software-Practice and Experience22(11), 1011–1025 (1992)
Fischer, C.N., LeBlanc, R.J., Jr.: Crafting a Compiler with C. Reading, MA: Benjamin/Cummings 1991
Grosch, J.: Efficient generation of lexical analysers. Software-Practice and Experience19(11), 1089–1103 (1989)
Heuring, V.P.: The automatic generation of fast lexical analysers. Software-Practice and Experience16(9), 801–808 (1986)
Horspool, R.N., Levy, M.R.: Mkscan—An interactive scanner generator. Software-Practice and Experience17(6), 369–378 (1987)
Johnson, W.L., Porter, J.H., Ackley, S.I., Ross, D.T.: Automatic generation of efficient lexical processors using finite state techniques. Commun. ACM11(12), 305–313 (1968)
Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. on Computing6(2), (1977)
Koskimies, K., Paakki, J.: Automating Language Implementation. New York: Ellis Horwood 1990
Lesk, M.E., Schmidt, E.: LEX—A lexical analyzer generator. Computer Science Technical Report 39, Bell Labs., Murray Hill, N.J., 1975
Mössenböck, H.: Alex—A simple and efficient scanner generator. ACM SIGPLAN Notices21(5), 69–78 (1986)
Nawrocki, J.R.: Conflict detection and resolution in a lexical analyzer generator. Information Processing Letters38, 323–328 (1991)
Paxson, V.: The Flex User Document, Version 2.3. Computer Science Department, Cornell Univ., Ithaca, NY, 1990
Szafron, D., Ng, R.: LexAGen: An interactive incremental scanner generator. Software—Practice and Experience20(5), 459–483 (1990)
Waite, W.M.: The cost of lexical analysis. Software—Practice and Experience16(5), 473–488 (1986)
Wirth, N.: Programming with Modula-2 (3rd corrected edn.) New York: Springer-Verlag 1985
Author information
Authors and Affiliations
Additional information
This work was supported in part by National Science Council, Taiwan, R.O.C. under grants NSC 83-0111-S-009-001-CL and NSC 84-2213-E-009-043
Rights and permissions
About this article
Cite this article
Yang, W. On the look-ahead problem in lexical analysis. Acta Informatica 32, 459–476 (1995). https://doi.org/10.1007/BF01213079
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF01213079