Abstract
Real-world natural language sentences are often long and complex, and contain unexpected grammatical constructions. They even include noise and ungrammaticality. This paper describes the Controlled Skip Parser, a program that parses such real-world sentences by skipping some of the words in the sentence. The new feature of this parser is that it controls its behavior by finding out which words to skip, without using domain-specific knowledge. The parser is a priority-based chart parser. By assigning appropriate priority levels to the constituents in the chart, the parser's behavior is controlled. Statistical information is used for assigning priority levels. The statistical information (n-grams) can be thought of as a generalized approximation of the grammar learned from past successful experiences. The control mechanism gives a great speed-up and reduction in memory usage. Experiments on real newspaper articles are shown, and our experience with this parser in a machine translation system is described.
Similar content being viewed by others
References
Ayuso, D.M. and the PLUM Research Group: 1994, ‘Pattern Matching in a Linguistically-Motivated Text Understanding System’, in Proceedings of the ARPA Human Language Technology Workshop, Plainsboro, NJ, pp. 182–186.
Brown, P.F., J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P.S. Roossin: 1990, ‘A Statistical Approach to Machine Translation’, Computational Linguistics 16, 79–85.
Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer: 1993, ‘The Mathematics of Statistical Machine Translation: Parameter Estimation’, Computational Linguistics 19, 263–311.
Hobbs, J.R., D.E. Appelt, J. Bear, and M. Tyson: 1992, ‘Robust Processing of Real-World Natural Language Texts', in Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 186–192.
Jackson, E., D. Appelt, J. Bear, R Moore, and A. Podlozny: 1991, ‘A Template Matcher for Robust NL Interpretation’, in Workshop on Speech and Natural Language Processing, Asilomar, California, pp. 190–194.
Kay, M.: 1980, Algorithm Schemata and Data Structures in Syntactic Processing, Technical Report, CSL-80–12, Xerox Palo Alto Research Center.
Knight, K., I. Chander, M. Haines, V. Hatzivassiloglou, E.H. Hovy, M. Iida, S.K. Luk, R.A. Whitney, and K. Yamada: 1995, ‘Filling Knowledge Gaps in a Broad-Coverage Machine Translation System’, in 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 1390–1396.
Lavie, A.: 1994, ‘An Integrated Heuristic Scheme for Partial Parse Evaluation’, in 32rd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 316–318.
Lavie, A., L. Levin, A. Waibel, D. Gates, M. Gavalda, and L. Mayfield: 1996, ‘System Description JANUS, Multilingual Translation of Spontaneous Speech in a Limited Domain’, in Expanding MT Horizons: Proceedings of the Second Conference of the Association for Machine Translation in the Americas, Montreal, Quebec, pp. 252–255.
Lavie, A. and M. Tomita: 1993, ‘GLR*– An Efficient Noise-skipping Parsing Algorithm for Context Free Grammars’, in Proceedings of the 3rd International Workshop on Parsing Technology, Tilburg, The Netherlands and Durbuy, Belgium, pp. 123–134.
Matsumoto, Y., S. Kurohashi, T. Utsuro, Y. Myoki, and M. Nagao: 1993, Japanese Morphological Analysis System JUMAN Manual, Kyoto University.
McDonald, D.D.: 1992, ‘An Efficient Chart-based Algorithm for Partial-Parsing of Unrestricted Texts’, in Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 193–200.
Mellish, C.S.: 1989, ‘Some Chart-based Techniques for Parsing Ill-formed Input’, in 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, British Columbia, pp. 102–109.
Pereira, F.C., Y. Singer, and N. Tishby: 1995, ‘Beyond Word N-grams’, in 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Mass., pp. 95–106.
Rabiner, L. and B. Juang: 1993, Fundamentals of Speech Recognition. Prentice Hall, New Jersey.
Seneff, S.: 1992, ‘A Relaxation Method for Understanding Spontaneous Speech Utterances’, in Proceedings: Speech and Natural Language Workshop, Harriman, NY, pp. 299–304.
Stallard, D. and R. Bobrow: 1993, ‘The Semantic Linker–A New Fragment Combining Method’, in Proceedings of the ARPA Human Language Technology Workshop, Princeton, NJ, pp. 37–42.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Yamada, K. A Controlled Skip Parser. Machine Translation 13, 1–15 (1998). https://doi.org/10.1023/A:1008044302570
Issue Date:
DOI: https://doi.org/10.1023/A:1008044302570