Abstract
The quality of learnt natural language grammars can be enhanced by exploiting the linguistic devices that comprise a corpus. This paper considers one such device, namely punctuation. After briefly considering the linguistics of punctuation, a model capturing some of these properties is presented. Following this, a series of experiments learning unification-based natural language grammars, using the Spoken English Corpus as data, demonstrate that even a simple model of punctuation increases the plausibility of learnt grammars over grammars learnt without the use of punctuation.
Preview
Unable to display preview. Download preview PDF.
References
Hiyan Alshawi, editor. The CORE Language Engine. The MIT Press, 1992.
Robert C. Berwick. The acquisition of syntactic knowledge. MIT Press, 1985.
Ezra Black, Roger Garside, and Geoffrey Leech, editors. Statistically driven computer grammars of English the IBM-Lancaster approach. Rodopi, 1993.
Ted Briscoe and Nick Waegner. Robust Stochastic Parsing Using the Inside-Outside Algorithm. In Proceedings of the AAAI Workshop on Statistically-based Techniques in Natural Language Processing, 1992.
Glenn Carroll and Eugene Charniak. Two Experiments on Learning Probabilistic Dependency Grammars from Corpora. In AAAI-92 Workshop Program: Statistically-Based NLP Techniques, San Jose, California, 1992.
John Carroll, Claire Grover, Ted Briscoe, and Bran Boguraev. A Development Environment for Large Natural Language Grammars. Technical report number 127, University of Cambridge Computer Laboratory, 1988.
Claudia Casadio. Semantic Categories and the Development of Categorial Grammars. In Richard T. Oehrle, editor, Categorial Grammars and Natural Language Structures, pages 95–123. D. Reidel, 1988.
K. Church and R. Patil. Coping with syntactic ambiguity or how to put the block in the box on the table. Computational Linguistics, 8:139–49, 1982.
D.R. Dowty, R.E. Wall, and S. Peters. Introduction to Montague Semantics. D. Reidel Publishing Company, 1981.
G. Gadzar, E. Klein, G.K. Pullum, and I.A. Sag. Generalized Phrase Structure Grammar. Harvard University Press, 1985.
E. M. Gold. Language Identification to the Limit. Information and Control, 10:447–474, 1967.
Claire Grover, Ted Briscoe, John Carroll, and Bran Boguraev. The Alvey Natural Language Tools Grammar (Second Release). Technical report number 162, University of Cambridge Computer Laboratory, 1989.
Philip Harrison, Steven Abney, Ezra Black, Dan Flickinger, Ralph Grishman Claudia Gdaniec, Donald Hindle, Robert Ingria, Mitch Marcus, Beatrice Santorini, and Tomek Strzalkowski. Natural Language Processing Systems Evaluation Workshop, Technical Report rl-tr-91-362. In Jeannette G. Neal and Sharon M. Walter, editors, Evaluating Syntax Performance of Parser/Grammars of English, 1991.
Ray S. Jackendoff. X-Bar Syntax: A Study of Phrase Structure. The M.I.T Press, 1977.
S. Johansson, G. Leech, and H. Goodluck. Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Technical report, Department of English, University of Oslo, 1978.
Bernard E. M. Jones. Can Punctuation Help Parsing? In 15 th International Conference on Computational Linguistics, Kyoto, Japan, 1994.
Fanny Leech. An approach to probabilistic parsing. MPhil Dissertation, 1987. University of Lancaster.
Geoffrey Leech and Roger Garside. Running a grammar factory: The production of syntactically analysed corpora or “treebanks”. In Stig Johansson and Anna-Brita Stenström, editors, English Computer Corpora: Selected Papers and Research Guide. Mouten de Gruyter, 1991.
David Magerman and Carl Weir. Efficiency, Robustness and Accuracy in Picky Chart Parsing. In Proceedings of the 30th ACL, University of Delaware, Newark, Delaware, pages 40–47, 1992.
David M. Magerman. Natural Language Parsing as Statistical Pattern Recognition. PhD thesis, Stanford University, February 1994.
T. Mitchell, R. Keller, and S. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1.1:47–80, 1986.
G. Nunberg. The linguistics of punctuation. Center for the Study of Language and Information, 1990.
Miles Osborne. Learning Unification-based Natural Language Grammars. PhD thesis, University of York, September 1994.
Miles Osborne and Derek Bridge. Learning unification-based grammars using the Spoken English Corpus. In Grammatical Inference and Applications, pages 260–270. Springer Verlag, 1994.
Miles Osborne and Derek Bridge. More for Less: Learning a Wide Coverage Grammar from a Small Training Set. In International Conference on New Methods in Language Processing. Centre for Computational Linguistics, UMIST, Manchester, 1994.
Fernando Pereira and Yves Schabes. Inside-outside reestimation from partially bracketed corpora. In Proceedings of the 30th ACL, University of Delaware, Newark, Delaware, pages 128–135, 1992.
S. M. Shieber, H. Uszkoreit, F. C. N. Pereira, and M. Tyson. The Formalism and Implementation of PATR-II. In Research on Interactive Aquisition and Use of Knowledge. SRI International, Menlo Park, California, 1983.
Lydia White. Universal Grammar and second language acquisition. John Benjamins Publishing Company, 1989.
S. J. Young and H. H. Shih. Computer Assisted Grammar Construction. In Grammatical Inference and Applications, pages 282–290. Springer Verlag, 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Osborne, M. (1996). Can punctuation help learning?. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_62
Download citation
DOI: https://doi.org/10.1007/3-540-60925-3_62
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive