Skip to main content

Integrating shallow and linguistic techniques for Information extraction from text

  • Conference paper
  • First Online:
Topics in Artificial Intelligence (AI*IA 1995)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 992))

Included in the following conference series:

  • 127 Accesses

Abstract

Many experiments have shown that traditional approaches to both Natural Language Processing (NLP) and Information Retrieval (IR) are not effective enough to extract information from text; as a matter of fact shallow techniques (such as statistics, keyword analysis, etc.) tend to be imprecise, although efficient and transportable, whereas linguistic approaches tend to be very precise but not robust and efficient. Integrating NLP and IR is the challenge for the evolution of text processing systems for the next few years. In this paper an architecture that integrates shallow and linguistic processing is presented. Shallow techniques are used to limit the linguistic analysis to the interesting sections, and to help the parser reduce the overhead. The linguistic analyzer carefully extracts the information, controlling the combinatorics of parsing and any misdirected parsing efforts. Some preliminary results show that the architecture has considerable advantages with respect to traditional approaches to information extraction from text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Douglas E. Appelt, Jerry R. Hobbs, John Bear, David Israel, and Mabry Tyson. Fastus: A finite-state processor for information extraction from real-world text. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Chambery, France, 1993.

    Google Scholar 

  2. John A. Bateman, Bernardo Magnini, and Fabio Rinaldi. The generalized {Italian,German,English} Upper Model. In Proceedings of the ECAI-94 workshop on Implemented Ontologies, Amsterdam, The Netherlands, 1994.

    Google Scholar 

  3. Nicola Cancedda. Segmentazione e classificazione in un sistema per l'estrazione di conoscenza da testi. Tesi di Laurea; Universita degli Studi di Roma ”La Sapienza”, December 1994.

    Google Scholar 

  4. Fabio Ciravegna. Understanding messages in a diagnostic domain. Information Processing and Management, 1995. (to appear in the Special Issue on Summarizing Texts).

    Google Scholar 

  5. Fabio Ciravegna and Alberto Lavelli. Controlling bidirectional parsing for efficient text analysis. In Fourth International Workshop on Parsing Technologies (IWPT'95), Prague, September 1995.

    Google Scholar 

  6. Luca Gilardoni, Paola Prunotto, and Gianluigi Rocca. Hierarchical pattern matching for knowledge based news categorization. In Proceedings of the RIAO '94, New York, October 1994.

    Google Scholar 

  7. The PLUM System Group. BBN: Description of the PLUM system as used for MUC-5. In Beth M. Sundheim, editor, Fifth Message Understanding Conference (MUCS), Baltimore, Maryland, August 1993. Morgan Kaufmann Publishers, Inc. (San Francisco, CA).

    Google Scholar 

  8. Philip J. Hayes, Laura E. Knecht, and Monica J. Cellio. A news story categorization system. In Proceedings of the Second Conference on Applied Natural Language Processing, Austin, Texas, 1988.

    Google Scholar 

  9. Jerry R. Hobbs. The generic information extraction system. In B. Sundheim, editor, Fifth Message Understanding Conference (MUC-5), San Francisco, CA, August 1993. Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  10. Jerry R. Hobbs, Douglas E. Appelt, and Mabry Tyson. Robust processing of real-world natural-language texts. In Proceedings of the Second Conference on Applied Natural Language Processing, Trento, Italy, March 1992.

    Google Scholar 

  11. Paul S. Jacobs. To parse or not to parse: Relation-driven text skimming. In Proceedings of the Thirteenth International Conference on Computational Linguistics, Helsinki, Finland, 1990.

    Google Scholar 

  12. Paul S. Jacobs. Joining statistics with nlp for text categorization. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, 1992.

    Google Scholar 

  13. Paul S. Jacobs. Parsing run amok: Relation-driven control for text analysis. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, July 1992. AAAI.

    Google Scholar 

  14. Martin Kay. Algorithm schemata and data structures in syntactic processing. Technical report, Xerox Palo Alto Research Center, Palo Alto CA, 1980.

    Google Scholar 

  15. R. M. MacGregor and R. Bates. The LOOM knowledge representation language. Technical Report ISI/RS-87-188, USC/ISI, 1987.

    Google Scholar 

  16. Bernardo Magnini, Carlo Strapparava, Fabio Ciravegna, and Emanuele Pianta. A project for the construction of an italian lexical knowledge base in the framework of wordnet. In International Workshop on the “Future of the Dictionary”, Grenoble, October 1994.

    Google Scholar 

  17. David D. McDonald. Robust partial-parsing through incremental, multi-algorithm processing. In Paul S. Jacobs, editor, Text-Based Intelligent Systems. Lawrence Erlbaum Associates, 1992.

    Google Scholar 

  18. Ellen Riloff and Wendy Lehnert. Classifying texts using relevancy signatures. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, California, July 1992.

    Google Scholar 

  19. Giorgio Satta and Oliviero Stock. Bi-directional context free grammar parsing for natural language processing. Artificial Intelligence, September 1994.

    Google Scholar 

  20. Beth M. Sundheim, editor. Third Message Understanding Conference (MUC-3). Morgan Kaufmann Publishers, Inc., San Diego, CA, May 1991. San Mateo, CA.

    Google Scholar 

  21. Beth M. Sundheim, editor. Fifth Message Understanding Conference (MUC5). Morgan Kaufmann Publishers, Inc., San Francisco CA, August 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicola Cancedda .

Editor information

Marco Gori Giovanni Soda

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ciravegna, F., Cancedda, N. (1995). Integrating shallow and linguistic techniques for Information extraction from text. In: Gori, M., Soda, G. (eds) Topics in Artificial Intelligence. AI*IA 1995. Lecture Notes in Computer Science, vol 992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60437-5_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-60437-5_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60437-2

  • Online ISBN: 978-3-540-47468-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics