Skip to main content

Regular Expressions at Their Best: A Case for Rational Design

  • Conference paper
Implementation and Application of Automata (CIAA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6482))

Included in the following conference series:

Abstract

Regular expressions are often an integral part of program customization and many algorithms have been proposed for transforming them into suitable data structures. These algorithms can be divided into two main classes: backtracking or automaton-based algorithms. Surprisingly, the latter class draws less attention than the former, even though automaton-based algorithms represent the oldest and by far the fastest solutions when carefully designed. Only two open-source automaton-based implementations stand out: PCRE and the recent RE2 from Google. We have developed, and present here, a competitive automaton-based regular expression engine on top of the LGPL C++ Automata Standard Template Library (ASTL), whose efficiency and scalability remain unmatched and which distinguishes itself through a unique and rigorous STL-like design.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers - Principles, Techniques and Tools. Addison-Wesley, Reading (1986)

    MATH  Google Scholar 

  2. Hopcroft, J.E., Ullman, J.D.: Introduction to automata, languages and computation. Addison-Wesley, Reading (1979)

    MATH  Google Scholar 

  3. Musser, D.R., Stepanov, A.: Generic Programming. In: Gianni, P. (ed.) ISSAC 1988. LNCS, vol. 358. Springer, Heidelberg (1989)

    Chapter  Google Scholar 

  4. Standard Template Library Programmer’s Guide, Silicon Graphics Computer Systems (1999), http://www.sgi.com/Technology/STL

  5. Stepanov, A., Lee, M.: The Standard Template Library. HP Laboratories Technical Report 95-11(R.1) (1995)

    Google Scholar 

  6. Le Maout, V.: Tools to Implement Automata, a first step: ASTL. In: Wood, D., Yu, S. (eds.) WIA 1997. LNCS, vol. 1436, pp. 104–108. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  7. Le Maout, V.: PhD Thesis: Expérience de programmation générique sur des structures non-séquentielles: les automates, Université de Marne-La-Vallée (2003)

    Google Scholar 

  8. Le Maout, V.: Cursors. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, pp. 195–207. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  9. Thompson, K.: Regular expression search algorithm. CACM 11(6) (1968)

    Google Scholar 

  10. Cox, R.: Regular Expression Matching Can Be Simple And Fast (2007), http://swtch.com/~rsc/regexp/regexp1.html

  11. Cox, R.: Regular Expression Matching: the Virtual Machine Approach (2009), http://swtch.com/~rsc/regexp/regexp2.html

  12. Cox, R.: Regular Expression Matching in the Wild (2010), http://swtch.com/~rsc/regexp/regexp3.html

  13. McNaughton, R., Yamada, H.: Regular expressions and state graphs for automata. IRE Transactions on Electronic Computers EC-9(1), 39–47 (1960)

    Article  MATH  Google Scholar 

  14. Laurikari, V.: Efficient Submatch Addressing for Regular Expressions (2001)

    Google Scholar 

  15. Laurikari, V.: NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions (2000)

    Google Scholar 

  16. Le Maout, V.: Regular Expression Performance Comparison (2010), http://astl.sourceforge.net/bench.7.html

  17. Maddock, J.: Boost Regex (2007), http://www.boost.org/doc/libs/1_42_0/libs/regex/doc/html/index.html

  18. Niebler, E.: Boost Xpressive (2007), http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/index.html

  19. PCRE. Univ. Cambridge (2009), http://sourceforge.net/projects/pcre/

  20. Niebler, E.: GRETA. Microsoft (2003), http://research.microsoft.com/en-us/downloads/BD99F343-4FF4-4041-8293-34C054EFE749/default.aspx

  21. Cox, R.: RE2, Google (2010), http://code.google.com/p/re2/

  22. Boyer, R.S., Moore, J.S.: A fast string searching algorithm. CACM 20(10), 762–772 (1977)

    Article  MATH  Google Scholar 

  23. Horspool, R.N.: Practical fast searching in strings. Software - Practice & Experience 10, 501–506 (1980)

    Article  Google Scholar 

  24. Baeza-Yates, R.A., Régnier, M.: Average running time of the Boyer-Moore-Horspool algorithm. Theoretical Computer Science 92(1), 19–31 (1992)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Le Maout, V. (2011). Regular Expressions at Their Best: A Case for Rational Design. In: Domaratzki, M., Salomaa, K. (eds) Implementation and Application of Automata. CIAA 2010. Lecture Notes in Computer Science, vol 6482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18098-9_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18098-9_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-18097-2

  • Online ISBN: 978-3-642-18098-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics