skip to main content
10.1145/1989323.1989479acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

The SystemT IDE: an integrated development environment for information extraction rules

Published: 12 June 2011 Publication History

Abstract

Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability," developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process.
Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rule-based IE system from IBMResearch that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples.

References

[1]
J. Cheney, L. Chiticariu, and W. Tan. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, 1(4):379--474, 2009.
[2]
L. Chiticariu, R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, and S. Vaithyanathan. SystemT: An Algebraic Approach to Declarative Information Extraction. In ACL, 2010.
[3]
L. Chiticariu, R. Krishnamurthy, Y. Li, F. Reiss, and S. Vaithyanathan. Domain Adaptation of Rule-based Annotators for Named-Entity Recognition Tasks. In EMNLP, 2010.
[4]
L. Chiticariu, Y. Li, S. Raghavan, and F. R. Reiss. Enterprise Information Extraction: Recent Developments and Open Challenges. In SIGMOD (Tutorial), 2010.
[5]
R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan, and H. Zhu. SystemT: a System for Declarative Information Extraction. SIGMOD Record, 37(4):7--13, 2008.
[6]
Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. V. Jagadish. Regular expression learning for information extraction. In EMNLP, 2008.
[7]
B. Liu, L. Chiticariu, V. Chu, H. V. Jagadish, and F. Reiss. Automatic Rule Refinement for Information Extraction. PVLDB, 3(1):588--597, 2010.
[8]
F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, and S. Vaithyanathan. An Algebraic Approach to Rule-Based Information Extraction. In ICDE, 2008.

Cited By

View all
  • (2023)Improving Developers’ Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179442(1238-1255)Online publication date: May-2023
  • (2016)Declarative Cleaning of Inconsistencies in Information ExtractionACM Transactions on Database Systems10.1145/287720241:1(1-44)Online publication date: 7-Apr-2016
  • (2015)INDREXInformation Systems10.1016/j.is.2014.11.00653:C(124-144)Online publication date: 1-Oct-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AQL
  2. SystemT
  3. information extraction
  4. pattern discovery
  5. provenance
  6. rule learning

Qualifiers

  • Demonstration

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Improving Developers’ Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies2023 IEEE Symposium on Security and Privacy (SP)10.1109/SP46215.2023.10179442(1238-1255)Online publication date: May-2023
  • (2016)Declarative Cleaning of Inconsistencies in Information ExtractionACM Transactions on Database Systems10.1145/287720241:1(1-44)Online publication date: 7-Apr-2016
  • (2015)INDREXInformation Systems10.1016/j.is.2014.11.00653:C(124-144)Online publication date: 1-Oct-2015
  • (2014)UIMA Ruta: Rapid development of rule-based information extraction applicationsNatural Language Engineering10.1017/S135132491400011422:01(1-40)Online publication date: 8-Oct-2014
  • (2013)INDREXProceedings of the sixteenth international workshop on Data warehousing and OLAP10.1145/2513190.2513196(93-100)Online publication date: 28-Oct-2013
  • (2013)Semi-automatic Dictionary Curation for Domain-Specific OntologiesProceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence10.1109/ICTAI.2013.112(727-734)Online publication date: 4-Nov-2013
  • (2012)Surfacing time-critical insights from social mediaProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213925(657-660)Online publication date: 20-May-2012
  • (2011)Facilitating pattern discovery for relation extraction with semantic-signature-based clusteringProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063781(1415-1424)Online publication date: 24-Oct-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media