Abstract
This paper presents a system for computer assisted grammar construction (CAGC). The CAGC system is designed to generate broad-coverage grammars for large natural language corpora by utilizing both an extended inside-outside algorithm and an automatic phrase bracketing (AUTO) system, which is designed to provide the extended algorithm with constituent information during learning. This paper demonstrates the capability of the CAGC system to deal with realistic natural language problems and the usefulness of the AUTO system in the inside-outside based grammar re-estimation. Performance results including an analysis of degree of coverage and bracketing precision are presented for a grammar constructed for the Wall Street Journal (WSJ) corpus.
Preview
Unable to display preview. Download preview PDF.
References
E. Black, S. Abney, D. Flicknger, C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J. Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. A procedure for quantitatively comparing the syntactic coverage of English grammar. In DARPA Speech and Natural Language Workshop, pages 306–311, 1991.
J.K. Baker. Trainable grammar for speech recognition. In Speech Communication Papers for the 97th Meeting of the acoustical Society of America (D. Klatt and J. Wolf, eds), pages 547–550, 1979.
E. Briscoe, C. Grover, B. Bogurraev, and J. Carroll. A formalism and environment for the development of a large grammar of english. In Proceedings of the 10th International Joint Conference on Artificial Intelligence, pages 703–708, Milan, Italy, 1987.
E. Black, J. Lafferty, and S. Roukos. Development and evaluation of a broadcoverage probabilistic grammar of English-language computer manuals. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pages 185–192, June 1992.
E. Briscoe and N. Waegner. Undergeneration and robust paring. In J. Arts, P. de Haan, and N. Oostdijk, editors, English language corpora: design, analysis and exploitation, pages 14–19. Rodopi, Amsterdam, 1993.
K. Fu and T. Booth. Grammatical inference: Introduction and survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8:343–375, 1986.
G. Gazdar and C. Mellish. Natural Language Processing in PROLOG. Addison-Wesley, 1988.
S.M. Lucas. New directions in grammatical inference. In Colloquium on Grammatical Inference: Theories, Applications and Alternatives, 1993.
K. Lari and S.J. Young. The estimation of stochastic context-free grammars using inside-outside algorithm. Computer Speech and Language, pages 35–56, 1990.
K. Lari and S.J. Young. Application of stochastic context-free grammars using inside-outside algorithm. Computer Speech and Language, pages 237–257, 1991.
M. Marcus. Very large annotated database of american english. In DARPA Speech and Natural Language Workshop, page 430, 1991.
F. Pereira and Y. Schabes. Inside-outside re-estimation for partially bracketed corpora. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pages 128–135, June 1992.
H. Rulot, N. Prieto, and E. Vidal. Learning accurate finite-state structural models of words through the ECGI algorithm. In Proceedings of the ICASSP 89, volume 2, pages 643–646, 1989.
H-H. Shih and S.J. Young. A system for computer assisted grammar construction. Technical Report TR.170, Engineering Department, Cambridge University, England, June 1994.
H.S. Thompson. Parseval workshop. In ELSNews Vol.1(2), 1992.
N. Waegner. Stochastic Models for Language Acquisition. PhD thesis, Cambridge University, England, 1993.
Peter Wyard. Context free grammar induction using genetic algorithm. In Colloquium on Grammatical Inference: Theories, Applications and Alternatives, 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Young, S.J., Shih, H.H. (1994). Computer assisted grammar construction. In: Carrasco, R.C., Oncina, J. (eds) Grammatical Inference and Applications. ICGI 1994. Lecture Notes in Computer Science, vol 862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58473-0_156
Download citation
DOI: https://doi.org/10.1007/3-540-58473-0_156
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58473-5
Online ISBN: 978-3-540-48985-6
eBook Packages: Springer Book Archive