research-article

Identifying the Truth: Aggregation of Named Entity Extraction Results

Authors:

Katja Pfeifer,

Johannes MeineckeAuthors Info & Claims

IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services

Pages 565 - 574

https://doi.org/10.1145/2539150.2539160

Published: 02 December 2013 Publication History

Get Access

Abstract

Huge amounts of textual information relevant for market analysis, trending or product monitoring can be found on the Web. To exploit that knowledge a number of extraction services were proposed that extract and categorize entities from given text. Prior work showed that a combination of individual extractors can increase quality. However, so far no system exists that is fully applicable to reasonably combine real world extraction services that differ substantially in the entity types they extract and the schemata used. In this paper, we propose an aggregation system and a corresponding aggregation process that can be used for these services. We present a number of novel aggregation techniques that incorporate schema-information as well as entity extraction specific characteristics into the aggregation process. The aggregation system is broadly evaluated on six real world named entity recognition services and compared to state of the art approaches.

References

[1]

Alias-i. LingPipe 4.1.0. http://alias-i.com/lingpipe, July 2013.

Google Scholar

[2]

D. Duong, J. Venuto, B. Goertzel, R. Richardson, S. Bohner, and E. A. Fox. Support vector machines to weight voters in a voting system of entity extractors. In IJCNN, pages 1226--1230, 2006.

Google Scholar

[3]

R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In Proc. of CoNLL-2003, pages 168--171, 2003.

Digital Library

Google Scholar

[4]

S. Grimes. Unstructured data and the 80 percent rule. http://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule/, 2008. Clarabridge Bridgepoints.

Google Scholar

[5]

Z. Kozareva, O. Ferrández, A. Montoyo, R. Mu A. Suárez, and J. Gómez. Combining data-driven systems for improving named entity recognition. Data Knowl. Eng., 61(3):449--466, 2007.

Digital Library

Google Scholar

[6]

L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.

Digital Library

Google Scholar

[7]

T. D. Lemmond, W. G. Hanley, J. Guensche, N. Perry, J. J. Nitao, P. Kidwell, K. Boakye, R. Glaser, and R. Prenger. Information extraction system. Patent US 2011/0213742 A1, 2011.

Google Scholar

[8]

T. D. Lemmond, N. Perry, J. Guensche, J. J. Nitao, R. Glaser, P. Kidwell, and W. G. Hanley. Enhanced named entity extraction via error-driven aggregation. In Proc. of DMIN, pages 31--37, 2010.

Google Scholar

[9]

K. Pfeifer and E. Peukert. Mapping text mining taxonomies. In Proc. of KDIR, pages 5--16, 2013.

Google Scholar

[10]

G. Rizzo and R. Troncy. Nerd: A framework for unifying named entity recognition and disambiguation extraction tools. In Proc. of EACL, pages 73--76, 2012.

Digital Library

Google Scholar

[11]

S. Sarawagi. Information Extraction. FnT Databases, 1(3), 2008.

Digital Library

Google Scholar

[12]

L. Si, T. Kanungo, and X. Huang. Boosting performance of bio-entity recognition by combining results from multiple systems. In Proc. of BIOKDD, pages 76--83, 2005.

Digital Library

Google Scholar

[13]

D. Wu, G. Ngai, and M. Carpuat. A stacked, voted, stacked model for named entity recognition. In Proc. of CoNLL, pages 200--203, 2003.

Digital Library

Google Scholar

Cited By

View all

Pfeifer KPeukert E(2015)Integration of Text Mining TaxonomiesKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-662-46549-3_3(39-55)Online publication date: 25-Apr-2015
https://doi.org/10.1007/978-3-662-46549-3_3

Index Terms

Identifying the Truth: Aggregation of Named Entity Extraction Results

Recommendations

Identifying non-elliptical entity mentions in a coordinated NP with ellipses

Display Omitted Our NER method resolves simple and complex ellipses in coordinated NPs.We presented two formal notations to express syntactic relationships between words.We model the process of making non-elliptical entity mentions into a coordinated ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication

In natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Automatic gazette creation for named entity recognition and application to resume processing
COMPUTE '12: Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies

Named entities are important content-carrying units within documents. Consequently named entity recognition (NER) is an important part of information extraction. One fast and accurate approach to NER uses a list or gazette consisting of known instances. ...

Comments

Information & Contributors

Information

Published In

IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services

December 2013

753 pages

ISBN:9781450321136

DOI:10.1145/2539150

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

@WAS: International Organization of Information Integration and Web-based Applications and Services

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IIWAS '13

IIWAS '13: The 15th International Conference on Information Integration and Web-based Applications & Services

December 2 - 4, 2013

Vienna, Austria

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
98
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Pfeifer KPeukert E(2015)Integration of Text Mining TaxonomiesKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-662-46549-3_3(39-55)Online publication date: 25-Apr-2015
https://doi.org/10.1007/978-3-662-46549-3_3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Identifying non-elliptical entity mentions in a coordinated NP with ellipses

Two-stage approach to named entity recognition using Wikipedia and DBpedia

Automatic gazette creation for named entity recognition and application to resume processing

Comments

Information

Published In

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations