Abstract
Many experiments have shown that traditional approaches to both Natural Language Processing (NLP) and Information Retrieval (IR) are not effective enough to extract information from text; as a matter of fact shallow techniques (such as statistics, keyword analysis, etc.) tend to be imprecise, although efficient and transportable, whereas linguistic approaches tend to be very precise but not robust and efficient. Integrating NLP and IR is the challenge for the evolution of text processing systems for the next few years. In this paper an architecture that integrates shallow and linguistic processing is presented. Shallow techniques are used to limit the linguistic analysis to the interesting sections, and to help the parser reduce the overhead. The linguistic analyzer carefully extracts the information, controlling the combinatorics of parsing and any misdirected parsing efforts. Some preliminary results show that the architecture has considerable advantages with respect to traditional approaches to information extraction from text.
Preview
Unable to display preview. Download preview PDF.
References
Douglas E. Appelt, Jerry R. Hobbs, John Bear, David Israel, and Mabry Tyson. Fastus: A finite-state processor for information extraction from real-world text. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Chambery, France, 1993.
John A. Bateman, Bernardo Magnini, and Fabio Rinaldi. The generalized {Italian,German,English} Upper Model. In Proceedings of the ECAI-94 workshop on Implemented Ontologies, Amsterdam, The Netherlands, 1994.
Nicola Cancedda. Segmentazione e classificazione in un sistema per l'estrazione di conoscenza da testi. Tesi di Laurea; Universita degli Studi di Roma ”La Sapienza”, December 1994.
Fabio Ciravegna. Understanding messages in a diagnostic domain. Information Processing and Management, 1995. (to appear in the Special Issue on Summarizing Texts).
Fabio Ciravegna and Alberto Lavelli. Controlling bidirectional parsing for efficient text analysis. In Fourth International Workshop on Parsing Technologies (IWPT'95), Prague, September 1995.
Luca Gilardoni, Paola Prunotto, and Gianluigi Rocca. Hierarchical pattern matching for knowledge based news categorization. In Proceedings of the RIAO '94, New York, October 1994.
The PLUM System Group. BBN: Description of the PLUM system as used for MUC-5. In Beth M. Sundheim, editor, Fifth Message Understanding Conference (MUCS), Baltimore, Maryland, August 1993. Morgan Kaufmann Publishers, Inc. (San Francisco, CA).
Philip J. Hayes, Laura E. Knecht, and Monica J. Cellio. A news story categorization system. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988.
Jerry R. Hobbs. The generic information extraction system. In B. Sundheim, editor, Fifth Message Understanding Conference (MUC-5), San Francisco, CA, August 1993. Morgan Kaufmann Publishers, Inc.
Jerry R. Hobbs, Douglas E. Appelt, and Mabry Tyson. Robust processing of real-world natural-language texts. In Proceedings of the Second Conference on Applied Natural Language Processing, Trento, Italy, March 1992.
Paul S. Jacobs. To parse or not to parse: Relation-driven text skimming. In Proceedings of the Thirteenth International Conference on Computational Linguistics, Helsinki, Finland, 1990.
Paul S. Jacobs. Joining statistics with nlp for text categorization. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, 1992.
Paul S. Jacobs. Parsing run amok: Relation-driven control for text analysis. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, July 1992. AAAI.
Martin Kay. Algorithm schemata and data structures in syntactic processing. Technical report, Xerox Palo Alto Research Center, Palo Alto CA, 1980.
R. M. MacGregor and R. Bates. The LOOM knowledge representation language. Technical Report ISI/RS-87-188, USC/ISI, 1987.
Bernardo Magnini, Carlo Strapparava, Fabio Ciravegna, and Emanuele Pianta. A project for the construction of an italian lexical knowledge base in the framework of wordnet. In International Workshop on the “Future of the Dictionary”, Grenoble, October 1994.
David D. McDonald. Robust partial-parsing through incremental, multi-algorithm processing. In Paul S. Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates, 1992.
Ellen Riloff and Wendy Lehnert. Classifying texts using relevancy signatures. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, July 1992.
Giorgio Satta and Oliviero Stock. Bi-directional context free grammar parsing for natural language processing. Artificial Intelligence, September 1994.
Beth M. Sundheim, editor. Third Message Understanding Conference (MUC-3). Morgan Kaufmann Publishers, Inc., San Diego, CA, May 1991. San Mateo, CA.
Beth M. Sundheim, editor. Fifth Message Understanding Conference (MUC5). Morgan Kaufmann Publishers, Inc., San Francisco CA, August 1993.
Author information
Authors and Affiliations
Corresponding author
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ciravegna, F., Cancedda, N. (1995). Integrating shallow and linguistic techniques for Information extraction from text. In: Gori, M., Soda, G. (eds) Topics in Artificial Intelligence. AI*IA 1995. Lecture Notes in Computer Science, vol 992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60437-5_12
Download citation
DOI: https://doi.org/10.1007/3-540-60437-5_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60437-2
Online ISBN: 978-3-540-47468-5
eBook Packages: Springer Book Archive