Embedded machine learning systems for natural language processing: A general framework

Cardie, Claire

doi:10.1007/3-540-60925-3_56

Embedded machine learning systems for natural language processing: A general framework

Claire Cardie¹

Conference paper
First Online: 01 January 2005

222 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Abstract

This paper presents Kenmore, a general framework for knowledge acquisition for natural language processing (NLP) systems. To ease the acquisition of knowledge in new domains, Kenmore exploits an online corpus using robust sentence analysis and embedded symbolic machine learning techniques while requiring only minimal human intervention. By treating all problems in ambiguity resolution as classification tasks, the framework uniformly addresses a range of subproblems in sentence analysis, each of which traditionally had required a separate computational mechanism. In a series of experiments, we demonstrate the successful use of Kenmore for learning solutions to several problems in lexical and structural ambiguity resolution. We argue that the learning and knowledge acquisition components should be embedded components of the NLP system in that (1) learning should take place within the larger natural language understanding system as it processes text, and (2) the learning components should be evaluated in the context of practical language-processing tasks.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

D. Aha, D. Kibler, and M. Albert. Instance-Based Learning Algorithms. Machine Learning, 6(1):37–66, 1991.
Google Scholar
Chinatsu Aone and William Bennett. Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies. in Proceedings of the 33rd Annual Meeting of the ACL, pages 122–129. Association for Computational Linguistics, 1995.
Google Scholar
A. van den Bosch and W. Daelemans. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of European Chapter of ACL, pages 45–53, Utrecht, 1993. Also available as ITK Research Report 42.
Google Scholar
E. Brill. Some Advances in Transformation-Based Part of Speech Tagging. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 722–727. AAAI Press/MIT Press, 1994.
Google Scholar
C. Cardie. Corpus-Based Acquisition of Relative Pronoun Disambiguation Heuristics. In Proceedings of the 30th Annual Meeting of the ACL, pages 216–223, University of Delaware, Newark, DE, 1992. Association for Computational Linguistics.
Google Scholar
C. Cardie. Learning to Disambiguate Relative Pronouns. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 38–43, San Jose, CA, 1992. AAAI Press/MIT Press.
Google Scholar
C. Cardie. Using Decision Trees to Improve Case-Based Learning. In P. Utgoff, editor, Proceedings of the Tenth International Conference on Machine Learning, pages 25–32, University of Massachusetts, Amherst, MA, 1993. Morgan Kaufmann.
Google Scholar
C. Cardie. Domain-Specific Knowledge Acquisition for Conceptual Sentence Analysis. PhD thesis, University of Massachusetts, Amherst, MA, 1994. Available as University of Massachusetts, CMPSCI Technical Report 94-74.
Google Scholar
E. Charniak. Equations for Part-of-Speech Tagging. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 784–789, Washington, DC, 1993. AAAI Press / MIT Press.
Google Scholar
T. Chen, V. Soo, and A. Lin. Learning to Parse with Recurrent Neural Networks. In Proceedings of European Conference on Machine Learning Workshop on Machine Learning and Text Analysis, pages 63–68, 1993.
Google Scholar
N. Chinchor, L. Hirschman, and D. Lewis. Evaluating Message Understanding Systems: An Analysis of the Third Message Undestanding Conference (MUC-3). Computational Linguistics, 19(3):409–449, 1993.
Google Scholar
K. Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second Conference on Applied Natural Language Processing, pages 136–143. Association for Computational Linguistics, 1988.
Google Scholar
W. Daelemans, G. Durieux, and S. Gillis. The Acquisition of Stress: A Data-Oriented Approach. Computational Linguistics, 20(3):421–451, 1994.
Google Scholar
D. Fisher. Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning, 2:139–172, 1987.
Google Scholar
W. Lehnert. Symbolic/Subsymbolic Sentence Analysis: Exploiting the Best of Two Worlds. In J. Barnden and J. Pollack, editors, Advances in Connectionist and Neural Computation Theory, pages 135–164. Ablex Publishers, Norwood, NJ, 1990.
Google Scholar
W. Lehnert, J. McCarthy, S. Soderland, E. Riloff, C. Cardie, J. Peterson, F. Feng, C. Dolan, and S. Goldman. University of Massachusetts/Hughes: Description of the CIRCUS System as Used in MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 277–291, San Mateo, CA, 1993. Morgan Kaufmann.
Google Scholar
W. Lehnert and B. Sundheim. A performance evaluation of text analysis technologies. Artificial Intelligence Magazine, 12(3):81–94, 1991.
Google Scholar
Diane J. Litman and Rebecca J. Passonneau. Combining Multiple Knowledge Sources for Discourse Segmentation. In Proceedings of the 33rd Annual Meeting of the ACL, pages 108–115. Association for Computational Linguistics, 1995.
Google Scholar
Joseph F. McCarthy and Wendy G. Lehnert. Using Decision Trees for Coreference Resolution. In C. Mellish, editor, Proceedings of the Fourteenth International Conference on Artificial Intelligence, pages 1050–1055, 1995.
Google Scholar
Proceedings of the Third Message Understanding Conference (MUC-3). Morgan Kaufmann, San Mateo, CA, 1991.
Google Scholar
Proceedings of the Fifth Message Understanding Conference (MUC-5). Morgan Kaufmann, San Mateo, CA, 1994.
Google Scholar
J. R. Quinlan. Learning Logical Definitions from Relations. Machine Learning, 5:239–266, 1990.
Google Scholar
J. R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1992.
Google Scholar
P. Utgoff. An Improved Algorithm for Incremental Induction of Decision Trees. In W. Cohen and H. Hirsh, editors, Proceedings of the Eleventh International Conference on Machine Learning, pages 318–325, Rutgers University, New Brunswick, NJ, 1994. Morgan Kaufmann.
Google Scholar
A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13:260–269, 1967.
Google Scholar
S. Wermter. Combining Symbolic and Connectionist Techniques for Coordination in Natural Language. In Proceedings of the 14th German Workshop on Artificial Intelligence, Eringerfeld, Germany, 1990.
Google Scholar
S. Wermter and W. Lehnert. A hybrid symbolic/connectionist model for nounphrase understanding. Connection Science, 1(3), 1989.
Google Scholar
David Yarowsky. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32th Annual Meeting of the ACL, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, 14853, Ithaca, NY, USA
Claire Cardie

Authors

Claire Cardie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cardie, C. (1996). Embedded machine learning systems for natural language processing: A general framework. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_56

Download citation

DOI: https://doi.org/10.1007/3-540-60925-3_56
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics