Abstract
In this paper we describe a method of effective handling of linguistic data by means of covering and inhibiting patterns - patterns that “compete” each other. A methodology of developing such patterns is outlined. Applications in the areas of morphology, hyphenation and part-of-speech tagging are shown. This pattern-driven approach to language engineering allows the combination of linguist expertise with the data learned from corpora - layering of knowledge. Searching for information in pattern database (dictionary problem) is blindingly fast - linear with respect to the length of searching word as with other finite-state approaches.
This research has been partially supported by the Czech Ministry of Education under the Grant VS97028 and by the Grant CEZ:J07/98:143300003.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Steven Paul Abney. Part-of-Speech Tagging and Partial Parsing. pages 118–136, Dordrecht, 1997. Kluwer Academic Publishers Group.
J. Richard Büchi. Towards a Theory of Formal Expressions. Springer-Verlag, New York, U.S.A, 1989.
Cezar Câmpeanu, Nicolae Sânteau, and Sheng Yu. Minimal cover-automata for finite languages. In Jean-Marc Champarnaud, Denis Maurel, and Djelloul Ziadi, editors, Lecture Notes in Computer Science 1660, pages 43–56, Berlin, Heidelberg, 1998. Springer-Verlag.
Nelson W. Francis and Henry Kučera. Frequency Analysis of English Usage: Lexicon and Grammar. Houghton Mifflin, 1982.
Maurice Gross. The Construction of Local Grammars. [21], pages 329–354.
Jozef Gruska. Foundations of Computing. International Thomson Computer Press, 1997.
Jerry R. Hobbs, Douglas Appelt, John Bear, David Israel, Megumi Kameyama, Mark Stickel, and Mabry Tyson. FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text. [21], pages 383–406.
Tao Jiang, Arto Salomaa, Kai Salomaa, and Sheng Yu. Decision problems for patterns. Journal of Computer and Systems Sciences, 50(1):53–63, 1995.
Fred Karlsson, A. Voutilainen, J. Heikkilä, and A. Antilla. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin, 1995.
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular Expressions for Language Engineering. Natural Language Engineering, 2(4):305–328, 1996.
Donald E. Knuth. The T E Xbook, volume A of Computers and Typesetting. Addison-Wesley, Reading, MA, USA, 1986.
Gabriele Kodydek. A Word Analysis System for German Hyphenation, Full Text Search, and Spell Checking, with Regard to the Latest Reform of German Orthography. In Sojka et al. [26], pages 51–56.
András Kornai. Extended Finite State Models of Language. Cambridge University Press, 1999.
Franklin M. Liang. Word Hy-phen-a-tion by Com-put-er. Ph.D. Thesis, Department of Computer Science, Stanford University, August 1983.
Franklin M. Liang and Peter Breitenlohner. PATtern GENeration program for the TEX82 hyphenator. Electronic documentation of PATGEN program version 2.3 from web2c distribution on CTAN, 1999.
Mehryar Mohri. On some applications of finite-state automata theory to natural language processing. Natural Language Engineering, 2(1):61–80, 1996.
Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23(2):269–311, 1997.
Mehryar Mohri. Minimization algorithms for sequential transducers. Theoretical Computer Science, 234:177–201, 2000.
Mehryar Mohri, Fernando C.N. Pereira, and Michael D. Riley. FSM Library-General-purpose finite-state machine software tools, 1998. http://www.research.att.com/sw/tools/fsm/.
Karel Oliva, Milena Hnátková, Vladimýr Petkevič, and Paven Květoň. The Linguistic Basis of a Rule-Based Tagger of Czech. In Sojka et al. [26], pages 3–8.
Emmanuel Roche and Yves Schabes. Finite-State Language Processing. MIT Press, 1997.
Radek Sedláček. Morphological Analyzer of Czech (in Czech). Master’s thesis, Faculty of Informatics, April 1999.
Max Silberztein. INTEX: an FST toolbox. Theoretical Computer Science, 234:33–46, 2000.
Petr Sojka. Notes on Compound Word Hyphenation in TEX. TUGboat, 16(3):290–297, 1995.
Petr Sojka. Hyphenation on Demand. TUGboat, 20(3):241–247, 1999.
Petr Sojka, Ivan Kopeček, and Karel Pala, editors. Proceedings of the Third Workshop on Text, Speech and Dialogue —TSD 2000, LNAI 1902, Brno, Czech Republic, Sep 2000. Springer-Verlag.
Petr Sojka and Pavel Ševeček. Hyphenation in TEX — Quo Vadis? TUGboat, 16(3):280–289, 1995.
Bruce W Watson. Implementing and using finite automata toolkits. [13], pages 19–36.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sojka, P. (2000). Competing Patterns for Language Engineering. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2000. Lecture Notes in Computer Science(), vol 1902. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45323-7_27
Download citation
DOI: https://doi.org/10.1007/3-540-45323-7_27
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41042-3
Online ISBN: 978-3-540-45323-9
eBook Packages: Springer Book Archive