Skip to main content

An Extendible Regular Expression Compiler for Finite-State Approaches in Natural Language Processing

  • Conference paper
  • First Online:
Book cover Automata Implementation (WIA 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2214))

Included in the following conference series:

Abstract

Finite-state techniques are widely used in various areas of Natural Language Processing (NLP).As Kaplan and Kay [12] have argued, regular expressions are the appropriate level of abstraction for thinking about finite-state languages and finite-state relations.More complex finite-state operations (such as contexted replacement) are defined on the basis of basic operations (such as Kleene closure, complementation, composition).

In order to be able to experiment with such complex finite-state operations the FSA Utilities (version 5) provides an extendible regular expression compiler.The paper discusses the regular expression operations provided by the compiler, and the possibilities to create new regular expression operators.The benefits of such an extendible regular expression compiler are illustrated with a number of examples taken from recent publications in the area of finite-state approaches to NLP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steven Abney. Partial parsing via finite-state cascades. In John Carroll, editor, Workshop on Robust Parsing; Eight European Summer School in Logic, Language and Information, pages 8–15, 1995.

    Google Scholar 

  2. Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Addison-W esley, 1974.

    Google Scholar 

  3. Gosse Bouma.A modern computational linguistics course using dutch. In EACL 99: Computer and Internet Supported Education in Language and Speech Technology. Proceedings of a Workshop sponsored by ELSNET and The Association for Computational Linguistics, Bergen Norway, 1999.

    Google Scholar 

  4. Christian S. Calude, Kai Salomaa, and Sheng Yu.Metric lexical analysis. In O. Boldt, H. Juergensen, and L. Robbins, editors, Workshop on Implementing Automata; WIA99 Pre-Proceedings, Potsdam Germany, 1999.

    Google Scholar 

  5. Jean-Pierre Chanod and Pasi Tapanainen.A robust finite-state grammar for French. In John Carroll, editor, Workshop on Robust Parsing, Prague, 1996. These proceedings are also available as Cognitive Science Research Paper #435; School of Cognitive and Computing Sciences, University of Sussex.

    Google Scholar 

  6. P.C. Uit den Boogaart. Woordfrequenties in geschreven en gesproken Nederlands. Oosthoek, Scheltema & Holkema, Utrecht, 1975. Werkgroep Frequentie-onderzoek van het Nederlands.

    Google Scholar 

  7. Dale Gerdemann and Gertjan van Noord.Transducers from rewrite rules with backreferences. In Ninth Conference of the European Chapter of the Association for Computational Linguistics, Bergen Norway, 1999.

    Google Scholar 

  8. Gregory Grefenstette.Light parsing as finite-state filtering. In EACI 1996 Workshop Extended Finite-State Models of Language, Budapest, 1996.

    Google Scholar 

  9. John E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In Z. Kohavi, editor, The Theory of Machines and Computations, pages 189–196. Academic Press, 1971.

    Google Scholar 

  10. John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison Wesley, 1979.

    Google Scholar 

  11. C. Douglas Johnson. Formal Aspects of Phonological Descriptions. Mouton, The Hague, 1972.

    Google Scholar 

  12. Ronald Kaplan and Martin Kay.Regular models of phonological rule systems. Computational Linguistics, 20(3):331–379, 1994.

    Google Scholar 

  13. Lauri Karttunen.The replace operator. In 33th Annual Meeting of the Association for Computational Linguistics, M.I.T. Cambridge Mass., 1995.

    Google Scholar 

  14. Lauri Karttunen.Directed replacement. In 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 1996.

    Google Scholar 

  15. Lauri Karttunen. The replace operator. In Emannual Roche and Yves Schabes, editors, Finite-State Language Processing, pages 117–147. Bradford, MIT Press, 1997.

    Google Scholar 

  16. Lauri Karttunen.The proper treatment of optimality theory in computational phonology. In Finite-state Methods in Natural Language Processing, pages 1–12, Ankara, 1998.

    Google Scholar 

  17. George Anton Kiraz and Edmund Grimley-Evans.Multi-tape automata for speech and language systems: A prolog implementation. In Derick Wood and Sheng Yu, editors, Automata Implementation. Second Internation Workshop on Implementing Automata, WIA’ 97, pages 87–103. Springer Lecture Notes in Computer Science 1436, 1998.

    Google Scholar 

  18. Mehryar Mohri, Fernando C.N. Pereira, and Michael Riley. A rational design for a weighted finite-state transducer library. In Automata Implementation. Second International Workshop on Implementing Automata, WIA’ 97. Springer Verlag, 1998. Lecture Notes in Computer Science 1436.

    Chapter  Google Scholar 

  19. Mehryar Mohri and Richard Sproat.An efficient compiler for weighted rewrite rules. In 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, 1996.

    Google Scholar 

  20. Alan Prince and Paul Smolensky. Optimalit y theory: Constraint interaction in generative grammar. Technical Report TR-2, Rutgers University Cognitive Science Center, New Brunswick, NJ, 1993. MIT Press, To Appear.

    Google Scholar 

  21. D. Raymond and D. Wood. The grail papers. Technical Report TR-491, University of Western Ontario, Department of Computer Science, London Ontario, 1996.

    Google Scholar 

  22. Emmanuel Roche.Parsing with finite-state transducers. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing, pages 241–281. MIT Press, Cambridge, 1997.

    Google Scholar 

  23. Emmanuel Roche and Yves Schabes.Introduction. In Emmanuel Roche and Yves Schabes, editors, Finite-State Language Processing. MIT Press, Cambridge, Mass, 1997.

    Google Scholar 

  24. Gertjan van Noord.FSA Utilities: A toolbox to manipulate finite-state automata. In Darrell Raymond, Derick Wood, and Sheng Yu, editors, Automata Implementation, pages 87–108. Springer Verlag, 1997. Lecture Notes in Computer Science 1260.

    Google Scholar 

  25. Gertjan van Noord.FSA Utilities (version 5), 1998. The FSAUtilities toolbox is available free of charge under Gnu General Public License at http://www.let.rug.nl/~vannoord/Fsa/.

  26. Gertjan van Noord.The treatment of epsilon moves in subset construction. In Finite-state Methods in Natural Language Processing, Ankara, 1998. cmplg/ 9804003.Accepted for Computational Linguistics.

    Google Scholar 

  27. Bruce W. Watson. Taxonomies and Toolkits of Regular Language Algorithms. PhD thesis, Eindhoven University of Technology, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

van Noord, G., Gerdemann, D. (2001). An Extendible Regular Expression Compiler for Finite-State Approaches in Natural Language Processing. In: Boldt, O., Jürgensen, H. (eds) Automata Implementation. WIA 1999. Lecture Notes in Computer Science, vol 2214. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45526-4_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-45526-4_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42812-1

  • Online ISBN: 978-3-540-45526-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics