abstract

Beyond linear chain: a journey through conditional random fields for information extraction from text

Author:

Diego MarcheggianiAuthors Info & Claims

ACM SIGIR Forum, Volume 48, Issue 1

Page 44

https://doi.org/10.1145/2641383.2641394

Published: 26 June 2014 Publication History

Get Access

Abstract

Information Extraction (IE) is a field at the crossroads of IR and NLP that studies methods for extracting information from text in such a way that this information can be used to populate a structured information repository. The main methods by means of which IE has been tackled rely on supervised learning;the best-performing such methods belong to the class of probabilistic graphical models, and, in particular, to the class of Conditional Random Fields (CRFs). In this thesis we investigate two major aspects related to textual IE via CRFs: (a) the creation of CRFs models that can outperform the commonly adopted linear- chain CRFs, and the creation of methods for ensuring the quality of training data and for assessing the impact of training data quality on the accuracy of CRFs systems for IE.

We start by facing the task of IE from medical documents written in the Italian language. We propose two novel approaches: (i) a cascaded, two-stage method composed by two layers of CRFs, and (ii) a confidence-weighted ensemble method that combines standard linear-chain CRFs and the proposed two-stage method. Both the proposed models are shown to outperform a standard linear-chain CRFs system.

We then investigate aspect-oriented sentence-level opinion mining from product reviews, that consists in predicting, for all sentences in the review, whether the sentence expresses a positive, neutral, or negative opinion (or no opinion at all) about a specific aspect of the product. We propose a set of increasingly powerful models based on CRFs, including a hierarchical multi-label CRFs scheme that jointly models the overall opinion expressed in a product review and the set of aspect-specific opinions expressed in each of its sentences. The proposed CRFs models are shown to obtain better results than linear-chain CRFs.

We then study the impact that the quality of training data has on the accuracy of an IE system via experiments performed on a dataset in which inter-coder agreement data are available. Finally, we investigate active learning techniques for a type of semi-supervised CRFs specifically devised for partially labeled sequences. We show that margin-based strategies always obtain the best results on the four tasks we have tested them on.

Index Terms

Beyond linear chain: a journey through conditional random fields for information extraction from text

Recommendations

A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining
11th International Conference on Web Information Systems Engineering --- WISE 2010 - Volume 6488

The task of opinion mining from product reviews is to extract the product entities and determine whether the opinions on the entities are positive, negative or neutral. Reasonable performance on this task has been achieved by employing rule-based, ...
A linear-chain CRF-based learning approach for web opinion mining
WISE'10: Proceedings of the 11th international conference on Web information systems engineering

The task of opinion mining from product reviews is to extract the product entities and determine whether the opinions on the entities are positive, negative or neutral. Reasonable performance on this task has been achieved by employing rule-based, ...
Tool wear state recognition based on linear chain conditional random field model

Tool condition monitoring (TCM) system is paramount for guaranteeing the quality of workpiece and improving the efficiency of the machining process. To overcome the shortcomings of Hidden Markov Model (HMM) and improve the accuracy of tool wear ...

Comments

Information & Contributors

Information

Published In

ACM SIGIR Forum Volume 48, Issue 1

June 2014

42 pages

ISSN:0163-5840

DOI:10.1145/2641383

Editors:
Fernando Diaz
Microsoft Research NYC, New York, NY
,
Raman Chandrasekar
Serials Solutions, Seattle, WA

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2014

Published in SIGIR Volume 48, Issue 1

Check for updates

Qualifiers

Abstract

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
89
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining

A linear-chain CRF-based learning approach for web opinion mining

Tool wear state recognition based on linear chain conditional random field model

Comments

Published In

Publisher

Publication History

Check for updates

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

Index Terms

Recommendations

A Linear-Chain CRF-Based Learning Approach for Web Opinion Mining

A linear-chain CRF-based learning approach for web opinion mining

Tool wear state recognition based on linear chain conditional random field model

Comments

Information

Published In

Publisher

Publication History

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations