Abstract
Corpus linguistics involves the construction and annotation of large databases of text from spoken and written language. These have applications in NLP and taught grammar. This annotation represents the problem of the KA “bottleneck” in a new application area. This paper introduces parse checking as a KA problem, and compares it to other tree-oriented KA methodologies such as laddering and clustering. It argues that corpus linguistics represents a significant application area for KA. The laddering tools discussed here have been used to process thousands of tree structures. The paper compares two tools in use on the ICE-GB corpus. One tool, ICE Tree II, exploits the structure of grammatical trees more fully than the other. Timing results show a main learning effect which dominates any difference comparison. However, the more integrated tool reduces the scope for error.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Preview
Unable to display preview. Download preview PDF.
References
Bowden, P., Halstead, P. and Rose, T.G. (1996), Extracting Conceptual Knowledge From Text Using Explicit Relation Markers, (in Shadbolt, O'Hara and Schreiber, 1996, 147–162).
Burnage, G., and Dunlop, D. (1992), Encoding the British National Corpus, in Aarts, J., de Haan, P., and Oostdijk, N. (eds.) 1992, English Language Corpora: Design, Analysis and Exploitation, Papers from the 13th international conference on English Language research on computerized corpora, Nijmegen 1992, Amsterdam: Rodopi.
Corbridge, C., Rugg, G., Major, N.P., Shadbolt N.R., and Burton, A.M. (1994), Laddering: technique and tool use in knowledge acquisition, Knowledge Acquisition (1994) 6,315–341.
Cupit, J., and Shadbolt, N.R. (1996), Knowledge Discovery in Databases: Exploiting Knowledge-Level Redescription (in Shadbolt, O'Hara and Schreiber, 1996, 245–261).
EAGLES (1996), Syntactic Annotation: Survey of Annotation practices. EAG-TCWG-SASG/2. Pisa: Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale.
Etherington, D.W., and Reiter, R. (1983), On Inheritance Hierarchies With Exceptions, reprinted in Brachman, R.J., and Levesque, H.J. (eds.) (1985) Readings in Knowledge Representation, San Mateo, CA: Morgan Kaufman.
Fang, A.C. (1996), The Survey Parser: Design and Development (Chapter 11 in Greenbaum, 1996b, 142–160).
Greenbaum, S. (1992), A New Corpus of English: ICE, in Svartvik, J. (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm 4-8 August 1991, Berlin: Mouton de Gruyter.
Greenbaum, S. (1996a), The Oxford English Grammar, Oxford: Oxford University Press.
—(ed.) (1996b), Comparing English Worldwide: The International Corpus of English, Oxford: Clarendon Press.
— and Ni, Y. (1996), About the ICE Tagset (Chapter 8 in Greenbaum, 1996b, 92–109).
Halteren, H. Van and Oostdijk, N. (1993), Towards a Syntactic Database: the TOSCA Analysis System, in Aarts, J, de Haan, P. and Oostdijk, N. (eds), English Language Corpora: Design, Analysis and Exploitation. Amsterdam: Rodopi.
Jonassen, D.H., Beissener, K., and Yacci, M. (1993), Structural Knowledge: Techniques for Representing, Conveying, and Acquiring Structural Knowledge, Hillsdale, NJ: LEA.
Leech, G. and Garside, R. (1991), Running a Grammar Factory: on the compilation of parsed corpora, or treebanks, in Johansson, S. and Stenström, A.-B. (eds), English Computer Corpora: Selected Papers and Research Guide. Berlin: Mouton de Gruyter, 15–32.
Major, N.P., and Reichgelt, H. (1990), ALTO: An Automated Laddering Tool, in Wielinga, B., Boose, J., Gaines, B. Schreiber, G., van Someren, M. (Eds.) (1990), Current Trends in Knowledge Acquisition, 222–236, Amsterdam: IOS Press.
Major, N.P., and Shadbolt, N.R. (1992), CNN: Integrating Knowledge Elicitation with a Machine Learning Technique, in Proceedings of JKAW-92.
Marcus, M., Marcinkiewicz, M.A. and Santorini, B. (1993), Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2, 313–330.
Michalski, R.S. and Stepp, R.E. (1983), Learning from observation: conceptual clustering, in Michalski, R.S., Carbonell J.G. and Mitchell T.M. (Eds.), Machine Learning: an Artificial Intelligence Approach, 331–363, Palo Alto: CA, Tioga.
Minsky, M. (1975), A Framework for the Representation of Knowledge, in Winston, P. (Ed.), The Psychology of Computer Vision, New York: McGraw Hill, 211–277.
Paskiewicz, T., Patten, C., Shadbolt, N.R., Swallow, S., and Wallis, S.A. (1991), Functional specification of SET tools, SET deliverable D006, University of Nottingham.
Quinn, A., and Porter, N. (1996), ICE Annotation Tools, (Chapter 6 in Greenbaum, 1996b, 65–78).
Shadbolt, N.R., O'Hara, K. and Schreiber, G. (eds.) Advances in Knowledge Acquisition, Proceedings of EKAW '96, Berlin: Springer-Verlaag.
Wallis, S.A. (1993), Machine Learning with Knowledge, in Proceedings of MLnet Workshop on Scientific Discovery 1993, MLnet.
— (1997), Exploiting hierarchical sets in A. L, PhD Thesis (submitted), University of Nottingham.
— and SHADBoLT, N.R. (1993), Induction as Knowledge Acquisition, Dept. of Psychology Postgraduate Conference 1993, Department of Psychology, University of Nottingham.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wallis, S., Nelson, G. (1997). Syntactic parsing as a knowledge acquisition problem. In: Plaza, E., Benjamins, R. (eds) Knowledge Acquisition, Modeling and Management. EKAW 1997. Lecture Notes in Computer Science, vol 1319. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0026792
Download citation
DOI: https://doi.org/10.1007/BFb0026792
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63592-5
Online ISBN: 978-3-540-69606-3
eBook Packages: Springer Book Archive