skip to main content
10.1145/2063576.2063935acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Constructing efficient information extraction pipelines

Published:24 October 2011Publication History

ABSTRACT

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.

References

  1. E. Agichtein and L. Gravano. Querying Text Databases for Efficient Information Extraction. In ICDE, pp. 113--124, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  2. E. Agichtein. Scaling Information Extraction to Large Document Collections. Bulletin of IEEE-CS Technical Committee on Data Engineering, 28:3--10, 2005.Google ScholarGoogle Scholar
  3. A. Björkelund, B. Bohnet, L. Hafdell, and P. Nugues. A High-Performance Syntactic and Semantic Dependency Parser. In COLING: Demonstrations, pp. 33--36, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Bohnet. Very High Accuracy and Fast Dependency Parsing is not a Contradiction. In COLING, pp. 89--97, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M.J. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, Scalable Information Extraction from the Web. In HLT and EMNLP, pp. 563--570, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Forman and E. Kirshenbaum. Extremely Fast Text Feature Extraction for Classification and Indexing. In CIKM, pp. 1221--1230, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. U. Germann, M. Jahr, K. Knight, D. Marcu, and Y. Yamada. Fast Decoding and Optimal Decoding for Machine Translation. In ACL, pp. 228--235, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Pauls and D. Klein. k-best A$^*$ Parsing. In ACL and IJCNLP, pp. 958--966, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Petrov. Coarse-to-Fine Natural Language Processing. PhD Thesis, University of California at Berkeley, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Ratinov and D. Roth. Design Challenges and Misconceptions in Named Entity Recognition. In CoNLL, pp. 147--155, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Schmid. 1995. Improvements in Part-of-Speech Tagging with an Application to German. In ACL SIGDAT-Workshop, pp. 47--50.Google ScholarGoogle Scholar
  12. B. Stein, S. Meyer zu Eissen, G. Gräfe, and F. Wissbrock. Automating Market Forecast Summarization from Internet Data. In WWW/Internet, pp. 395--402, 2005.Google ScholarGoogle Scholar
  13. H. Wachsmuth, P. Prettenhofer, and B. Stein. Efficient Statement Identification for Automatic Market Forecasting. In COLING, pp. 1128--1136, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D.C. Wimalasuriya and D. Dou. Components for Information Extraction: Ontology-Based Information Extractors and Generic Platform. In CIKM, pp. 9--18, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Constructing efficient information extraction pipelines

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
        October 2011
        2712 pages
        ISBN:9781450307178
        DOI:10.1145/2063576

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 October 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader