skip to main content
10.1145/1142473.1142595acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Managing information extraction: state of the art and research directions

Published: 27 June 2006 Publication History

Abstract

This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text). We first survey research on information extraction in the database, AI, NLP, IR, and Web communities in recent years. Then we discuss why this is the right time for the database community to actively participate and address the problem of managing information extraction (including in particular the challenges of maintaining and querying the extracted information, and accounting for the imprecision and uncertainty inherent in the extraction process). Finally, we show how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools. We do not assume prior knowledge of text management, NLP, extraction techniques, or machine learning.

Cited By

View all
  • (2025)Corporate relation extraction for the construction of knowledge-bases against tax fraudKnowledge-Based Systems10.1016/j.knosys.2025.113026311(113026)Online publication date: Feb-2025
  • (2024)The Semantics of COVID-19 Web Data: Ontology Learning and PopulationCurrent Materials Science10.2174/266614541666623011111353417:1(44-64)Online publication date: Mar-2024
  • (2022)Knowledge Graphs for Social Good: An Entity-Centric Search Engine for the Human Trafficking DomainIEEE Transactions on Big Data10.1109/TBDATA.2017.27631648:3(592-606)Online publication date: 1-Jun-2022
  • Show More Cited By

Index Terms

  1. Managing information extraction: state of the art and research directions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
    June 2006
    830 pages
    ISBN:1595934340
    DOI:10.1145/1142473
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. database management systems
    2. information extraction
    3. semantic integration

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)31
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Corporate relation extraction for the construction of knowledge-bases against tax fraudKnowledge-Based Systems10.1016/j.knosys.2025.113026311(113026)Online publication date: Feb-2025
    • (2024)The Semantics of COVID-19 Web Data: Ontology Learning and PopulationCurrent Materials Science10.2174/266614541666623011111353417:1(44-64)Online publication date: Mar-2024
    • (2022)Knowledge Graphs for Social Good: An Entity-Centric Search Engine for the Human Trafficking DomainIEEE Transactions on Big Data10.1109/TBDATA.2017.27631648:3(592-606)Online publication date: 1-Jun-2022
    • (2021)Location Extraction to Inform a Spanish-Speaking Community About Traffic IncidentsHandbook of Research on Natural Language Processing and Smart Service Systems10.4018/978-1-7998-4730-4.ch016(347-367)Online publication date: 2021
    • (2021)Computational Literature Reviews: Method, Algorithms, and RoadmapOrganizational Research Methods10.1177/109442812199123026:1(107-138)Online publication date: 9-Mar-2021
    • (2019)A knowledge construction methodology to automate case‐based learning using clinical documentsExpert Systems10.1111/exsy.1240137:1Online publication date: 10-Apr-2019
    • (2019)Keyword Search on RDF DatasetsAdvances in Information Retrieval10.1007/978-3-030-15719-7_44(332-336)Online publication date: 7-Apr-2019
    • (2018)Natural Language Data Management and InterfacesSynthesis Lectures on Data Management10.2200/S00866ED1V01Y201807DTM04910:2(1-156)Online publication date: 13-Aug-2018
    • (2018)Cost-effective conceptual design using taxonomiesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0501-127:3(369-394)Online publication date: 1-Jun-2018
    • (2018)Web Information ExtractionEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_459(4620-4629)Online publication date: 7-Dec-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media