Integrating shallow and linguistic techniques for Information extraction from text

Ciravegna, Fabio; Cancedda, Nicola

doi:10.1007/3-540-60437-5_12

Fabio Ciravegna¹ &
Nicola Cancedda¹^nAff2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 992))

Included in the following conference series:

Congress of the Italian Association for Artificial Intelligence

127 Accesses

Abstract

Many experiments have shown that traditional approaches to both Natural Language Processing (NLP) and Information Retrieval (IR) are not effective enough to extract information from text; as a matter of fact shallow techniques (such as statistics, keyword analysis, etc.) tend to be imprecise, although efficient and transportable, whereas linguistic approaches tend to be very precise but not robust and efficient. Integrating NLP and IR is the challenge for the evolution of text processing systems for the next few years. In this paper an architecture that integrates shallow and linguistic processing is presented. Shallow techniques are used to limit the linguistic analysis to the interesting sections, and to help the parser reduce the overhead. The linguistic analyzer carefully extracts the information, controlling the combinatorics of parsing and any misdirected parsing efforts. Some preliminary results show that the architecture has considerable advantages with respect to traditional approaches to information extraction from text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Douglas E. Appelt, Jerry R. Hobbs, John Bear, David Israel, and Mabry Tyson. Fastus: A finite-state processor for information extraction from real-world text. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Chambery, France, 1993.
Google Scholar
John A. Bateman, Bernardo Magnini, and Fabio Rinaldi. The generalized {Italian,German,English} Upper Model. In Proceedings of the ECAI-94 workshop on Implemented Ontologies, Amsterdam, The Netherlands, 1994.
Google Scholar
Nicola Cancedda. Segmentazione e classificazione in un sistema per l'estrazione di conoscenza da testi. Tesi di Laurea; Universita degli Studi di Roma ”La Sapienza”, December 1994.
Google Scholar
Fabio Ciravegna. Understanding messages in a diagnostic domain. Information Processing and Management, 1995. (to appear in the Special Issue on Summarizing Texts).
Google Scholar
Fabio Ciravegna and Alberto Lavelli. Controlling bidirectional parsing for efficient text analysis. In Fourth International Workshop on Parsing Technologies (IWPT'95), Prague, September 1995.
Google Scholar
Luca Gilardoni, Paola Prunotto, and Gianluigi Rocca. Hierarchical pattern matching for knowledge based news categorization. In Proceedings of the RIAO '94, New York, October 1994.
Google Scholar
The PLUM System Group. BBN: Description of the PLUM system as used for MUC-5. In Beth M. Sundheim, editor, Fifth Message Understanding Conference (MUCS), Baltimore, Maryland, August 1993. Morgan Kaufmann Publishers, Inc. (San Francisco, CA).
Google Scholar
Philip J. Hayes, Laura E. Knecht, and Monica J. Cellio. A news story categorization system. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988.
Google Scholar
Jerry R. Hobbs. The generic information extraction system. In B. Sundheim, editor, Fifth Message Understanding Conference (MUC-5), San Francisco, CA, August 1993. Morgan Kaufmann Publishers, Inc.
Google Scholar
Jerry R. Hobbs, Douglas E. Appelt, and Mabry Tyson. Robust processing of real-world natural-language texts. In Proceedings of the Second Conference on Applied Natural Language Processing, Trento, Italy, March 1992.
Google Scholar
Paul S. Jacobs. To parse or not to parse: Relation-driven text skimming. In Proceedings of the Thirteenth International Conference on Computational Linguistics, Helsinki, Finland, 1990.
Google Scholar
Paul S. Jacobs. Joining statistics with nlp for text categorization. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, 1992.
Google Scholar
Paul S. Jacobs. Parsing run amok: Relation-driven control for text analysis. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, July 1992. AAAI.
Google Scholar
Martin Kay. Algorithm schemata and data structures in syntactic processing. Technical report, Xerox Palo Alto Research Center, Palo Alto CA, 1980.
Google Scholar
R. M. MacGregor and R. Bates. The LOOM knowledge representation language. Technical Report ISI/RS-87-188, USC/ISI, 1987.
Google Scholar
Bernardo Magnini, Carlo Strapparava, Fabio Ciravegna, and Emanuele Pianta. A project for the construction of an italian lexical knowledge base in the framework of wordnet. In International Workshop on the “Future of the Dictionary”, Grenoble, October 1994.
Google Scholar
David D. McDonald. Robust partial-parsing through incremental, multi-algorithm processing. In Paul S. Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates, 1992.
Google Scholar
Ellen Riloff and Wendy Lehnert. Classifying texts using relevancy signatures. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, July 1992.
Google Scholar
Giorgio Satta and Oliviero Stock. Bi-directional context free grammar parsing for natural language processing. Artificial Intelligence, September 1994.
Google Scholar
Beth M. Sundheim, editor. Third Message Understanding Conference (MUC-3). Morgan Kaufmann Publishers, Inc., San Diego, CA, May 1991. San Mateo, CA.
Google Scholar
Beth M. Sundheim, editor. Fifth Message Understanding Conference (MUC5). Morgan Kaufmann Publishers, Inc., San Francisco CA, August 1993.
Google Scholar

Download references

Author information

Nicola Cancedda
Present address: Dipartimento di Informatica e Sistemistica, Università di Roma “La Sapienza”, Italy

Authors and Affiliations

Istituto per la Ricerca Scientifica e Tecnologica, Loc. Pantè di Povo, I-38050, Trento, Italy
Fabio Ciravegna & Nicola Cancedda

Authors

Fabio Ciravegna
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Cancedda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Cancedda .

Editor information

Marco Gori Giovanni Soda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ciravegna, F., Cancedda, N. (1995). Integrating shallow and linguistic techniques for Information extraction from text. In: Gori, M., Soda, G. (eds) Topics in Artificial Intelligence. AI*IA 1995. Lecture Notes in Computer Science, vol 992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60437-5_12

Download citation

DOI: https://doi.org/10.1007/3-540-60437-5_12
Published: 08 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60437-2
Online ISBN: 978-3-540-47468-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics