Skip to main content

Competing Patterns for Language Engineering

Methods to Handle and Store Empirical Data

  • Conference paper
  • First Online:
Text, Speech and Dialogue (TSD 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1902))

Included in the following conference series:

Abstract

In this paper we describe a method of effective handling of linguistic data by means of covering and inhibiting patterns - patterns that “compete” each other. A methodology of developing such patterns is outlined. Applications in the areas of morphology, hyphenation and part-of-speech tagging are shown. This pattern-driven approach to language engineering allows the combination of linguist expertise with the data learned from corpora - layering of knowledge. Searching for information in pattern database (dictionary problem) is blindingly fast - linear with respect to the length of searching word as with other finite-state approaches.

This research has been partially supported by the Czech Ministry of Education under the Grant VS97028 and by the Grant CEZ:J07/98:143300003.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Steven Paul Abney. Part-of-Speech Tagging and Partial Parsing. pages 118–136, Dordrecht, 1997. Kluwer Academic Publishers Group.

    Google Scholar 

  2. J. Richard Büchi. Towards a Theory of Formal Expressions. Springer-Verlag, New York, U.S.A, 1989.

    MATH  Google Scholar 

  3. Cezar Câmpeanu, Nicolae Sânteau, and Sheng Yu. Minimal cover-automata for finite languages. In Jean-Marc Champarnaud, Denis Maurel, and Djelloul Ziadi, editors, Lecture Notes in Computer Science 1660, pages 43–56, Berlin, Heidelberg, 1998. Springer-Verlag.

    Google Scholar 

  4. Nelson W. Francis and Henry Kučera. Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin, 1982.

    Google Scholar 

  5. Maurice Gross. The Construction of Local Grammars. [21], pages 329–354.

    Google Scholar 

  6. Jozef Gruska. Foundations of Computing. International Thomson Computer Press, 1997.

    Google Scholar 

  7. Jerry R. Hobbs, Douglas Appelt, John Bear, David Israel, Megumi Kameyama, Mark Stickel, and Mabry Tyson. FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. [21], pages 383–406.

    Google Scholar 

  8. Tao Jiang, Arto Salomaa, Kai Salomaa, and Sheng Yu. Decision problems for patterns. Journal of Computer and Systems Sciences, 50(1):53–63, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  9. Fred Karlsson, A. Voutilainen, J. Heikkilä, and A. Antilla. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin, 1995.

    Google Scholar 

  10. Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular Expressions for Language Engineering. Natural Language Engineering, 2(4):305–328, 1996.

    Article  Google Scholar 

  11. Donald E. Knuth. The T E Xbook, volume A of Computers and Typesetting. Addison-Wesley, Reading, MA, USA, 1986.

    Google Scholar 

  12. Gabriele Kodydek. A Word Analysis System for German Hyphenation, Full Text Search, and Spell Checking, with Regard to the Latest Reform of German Orthography. In Sojka et al. [26], pages 51–56.

    Google Scholar 

  13. András Kornai. Extended Finite State Models of Language. Cambridge University Press, 1999.

    Google Scholar 

  14. Franklin M. Liang. Word Hy-phen-a-tion by Com-put-er. Ph.D. Thesis, Department of Computer Science, Stanford University, August 1983.

    Google Scholar 

  15. Franklin M. Liang and Peter Breitenlohner. PATtern GENeration program for the TEX82 hyphenator. Electronic documentation of PATGEN program version 2.3 from web2c distribution on CTAN, 1999.

    Google Scholar 

  16. Mehryar Mohri. On some applications of finite-state automata theory to natural language processing. Natural Language Engineering, 2(1):61–80, 1996.

    Article  Google Scholar 

  17. Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23(2):269–311, 1997.

    MathSciNet  Google Scholar 

  18. Mehryar Mohri. Minimization algorithms for sequential transducers. Theoretical Computer Science, 234:177–201, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  19. Mehryar Mohri, Fernando C.N. Pereira, and Michael D. Riley. FSM Library-General-purpose finite-state machine software tools, 1998. http://www.research.att.com/sw/tools/fsm/.

  20. Karel Oliva, Milena Hnátková, Vladimýr Petkevič, and Paven Květoň. The Linguistic Basis of a Rule-Based Tagger of Czech. In Sojka et al. [26], pages 3–8.

    Google Scholar 

  21. Emmanuel Roche and Yves Schabes. Finite-State Language Processing. MIT Press, 1997.

    Google Scholar 

  22. Radek Sedláček. Morphological Analyzer of Czech (in Czech). Master’s thesis, Faculty of Informatics, April 1999.

    Google Scholar 

  23. Max Silberztein. INTEX: an FST toolbox. Theoretical Computer Science, 234:33–46, 2000.

    Article  MathSciNet  Google Scholar 

  24. Petr Sojka. Notes on Compound Word Hyphenation in TEX. TUGboat, 16(3):290–297, 1995.

    Google Scholar 

  25. Petr Sojka. Hyphenation on Demand. TUGboat, 20(3):241–247, 1999.

    Google Scholar 

  26. Petr Sojka, Ivan Kopeček, and Karel Pala, editors. Proceedings of the Third Workshop on Text, Speech and Dialogue —TSD 2000, LNAI 1902, Brno, Czech Republic, Sep 2000. Springer-Verlag.

    Google Scholar 

  27. Petr Sojka and Pavel Ševeček. Hyphenation in TEX — Quo Vadis? TUGboat, 16(3):280–289, 1995.

    Google Scholar 

  28. Bruce W Watson. Implementing and using finite automata toolkits. [13], pages 19–36.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sojka, P. (2000). Competing Patterns for Language Engineering. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_27

Download citation

  • DOI: https://doi.org/10.1007/3-540-45323-7_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41042-3

  • Online ISBN: 978-3-540-45323-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics