Skip to main content
Log in

A Controlled Skip Parser

  • Published:
Machine Translation

Abstract

Real-world natural language sentences are often long and complex, and contain unexpected grammatical constructions. They even include noise and ungrammaticality. This paper describes the Controlled Skip Parser, a program that parses such real-world sentences by skipping some of the words in the sentence. The new feature of this parser is that it controls its behavior by finding out which words to skip, without using domain-specific knowledge. The parser is a priority-based chart parser. By assigning appropriate priority levels to the constituents in the chart, the parser's behavior is controlled. Statistical information is used for assigning priority levels. The statistical information (n-grams) can be thought of as a generalized approximation of the grammar learned from past successful experiences. The control mechanism gives a great speed-up and reduction in memory usage. Experiments on real newspaper articles are shown, and our experience with this parser in a machine translation system is described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ayuso, D.M. and the PLUM Research Group: 1994, ‘Pattern Matching in a Linguistically-Motivated Text Understanding System’, in Proceedings of the ARPA Human Language Technology Workshop, Plainsboro, NJ, pp. 182–186.

  • Brown, P.F., J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P.S. Roossin: 1990, ‘A Statistical Approach to Machine Translation’, Computational Linguistics 16, 79–85.

    Google Scholar 

  • Brown, P.F., S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer: 1993, ‘The Mathematics of Statistical Machine Translation: Parameter Estimation’, Computational Linguistics 19, 263–311.

    Google Scholar 

  • Hobbs, J.R., D.E. Appelt, J. Bear, and M. Tyson: 1992, ‘Robust Processing of Real-World Natural Language Texts', in Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 186–192.

  • Jackson, E., D. Appelt, J. Bear, R Moore, and A. Podlozny: 1991, ‘A Template Matcher for Robust NL Interpretation’, in Workshop on Speech and Natural Language Processing, Asilomar, California, pp. 190–194.

    Google Scholar 

  • Kay, M.: 1980, Algorithm Schemata and Data Structures in Syntactic Processing, Technical Report, CSL-80–12, Xerox Palo Alto Research Center.

  • Knight, K., I. Chander, M. Haines, V. Hatzivassiloglou, E.H. Hovy, M. Iida, S.K. Luk, R.A. Whitney, and K. Yamada: 1995, ‘Filling Knowledge Gaps in a Broad-Coverage Machine Translation System’, in 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, pp. 1390–1396.

  • Lavie, A.: 1994, ‘An Integrated Heuristic Scheme for Partial Parse Evaluation’, in 32rd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM, pp. 316–318.

  • Lavie, A., L. Levin, A. Waibel, D. Gates, M. Gavalda, and L. Mayfield: 1996, ‘System Description JANUS, Multilingual Translation of Spontaneous Speech in a Limited Domain’, in Expanding MT Horizons: Proceedings of the Second Conference of the Association for Machine Translation in the Americas, Montreal, Quebec, pp. 252–255.

  • Lavie, A. and M. Tomita: 1993, ‘GLR*– An Efficient Noise-skipping Parsing Algorithm for Context Free Grammars’, in Proceedings of the 3rd International Workshop on Parsing Technology, Tilburg, The Netherlands and Durbuy, Belgium, pp. 123–134.

  • Matsumoto, Y., S. Kurohashi, T. Utsuro, Y. Myoki, and M. Nagao: 1993, Japanese Morphological Analysis System JUMAN Manual, Kyoto University.

  • McDonald, D.D.: 1992, ‘An Efficient Chart-based Algorithm for Partial-Parsing of Unrestricted Texts’, in Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 193–200.

  • Mellish, C.S.: 1989, ‘Some Chart-based Techniques for Parsing Ill-formed Input’, in 27th Annual Meeting of the Association for Computational Linguistics, Vancouver, British Columbia, pp. 102–109.

  • Pereira, F.C., Y. Singer, and N. Tishby: 1995, ‘Beyond Word N-grams’, in 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, Mass., pp. 95–106.

  • Rabiner, L. and B. Juang: 1993, Fundamentals of Speech Recognition. Prentice Hall, New Jersey.

    Google Scholar 

  • Seneff, S.: 1992, ‘A Relaxation Method for Understanding Spontaneous Speech Utterances’, in Proceedings: Speech and Natural Language Workshop, Harriman, NY, pp. 299–304.

    Google Scholar 

  • Stallard, D. and R. Bobrow: 1993, ‘The Semantic Linker–A New Fragment Combining Method’, in Proceedings of the ARPA Human Language Technology Workshop, Princeton, NJ, pp. 37–42.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yamada, K. A Controlled Skip Parser. Machine Translation 13, 1–15 (1998). https://doi.org/10.1023/A:1008044302570

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008044302570

Navigation