skip to main content
10.1145/2539150.2539160acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Identifying the Truth: Aggregation of Named Entity Extraction Results

Published: 02 December 2013 Publication History

Abstract

Huge amounts of textual information relevant for market analysis, trending or product monitoring can be found on the Web. To exploit that knowledge a number of extraction services were proposed that extract and categorize entities from given text. Prior work showed that a combination of individual extractors can increase quality. However, so far no system exists that is fully applicable to reasonably combine real world extraction services that differ substantially in the entity types they extract and the schemata used. In this paper, we propose an aggregation system and a corresponding aggregation process that can be used for these services. We present a number of novel aggregation techniques that incorporate schema-information as well as entity extraction specific characteristics into the aggregation process. The aggregation system is broadly evaluated on six real world named entity recognition services and compared to state of the art approaches.

References

[1]
Alias-i. LingPipe 4.1.0. http://alias-i.com/lingpipe, July 2013.
[2]
D. Duong, J. Venuto, B. Goertzel, R. Richardson, S. Bohner, and E. A. Fox. Support vector machines to weight voters in a voting system of entity extractors. In IJCNN, pages 1226--1230, 2006.
[3]
R. Florian, A. Ittycheriah, H. Jing, and T. Zhang. Named entity recognition through classifier combination. In Proc. of CoNLL-2003, pages 168--171, 2003.
[4]
S. Grimes. Unstructured data and the 80 percent rule. http://breakthroughanalysis.com/2008/08/01/unstructured-data-and-the-80-percent-rule/, 2008. Clarabridge Bridgepoints.
[5]
Z. Kozareva, O. Ferrández, A. Montoyo, R. Mu A. Suárez, and J. Gómez. Combining data-driven systems for improving named entity recognition. Data Knowl. Eng., 61(3):449--466, 2007.
[6]
L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, 2004.
[7]
T. D. Lemmond, W. G. Hanley, J. Guensche, N. Perry, J. J. Nitao, P. Kidwell, K. Boakye, R. Glaser, and R. Prenger. Information extraction system. Patent US 2011/0213742 A1, 2011.
[8]
T. D. Lemmond, N. Perry, J. Guensche, J. J. Nitao, R. Glaser, P. Kidwell, and W. G. Hanley. Enhanced named entity extraction via error-driven aggregation. In Proc. of DMIN, pages 31--37, 2010.
[9]
K. Pfeifer and E. Peukert. Mapping text mining taxonomies. In Proc. of KDIR, pages 5--16, 2013.
[10]
G. Rizzo and R. Troncy. Nerd: A framework for unifying named entity recognition and disambiguation extraction tools. In Proc. of EACL, pages 73--76, 2012.
[11]
S. Sarawagi. Information Extraction. FnT Databases, 1(3), 2008.
[12]
L. Si, T. Kanungo, and X. Huang. Boosting performance of bio-entity recognition by combining results from multiple systems. In Proc. of BIOKDD, pages 76--83, 2005.
[13]
D. Wu, G. Ngai, and M. Carpuat. A stacked, voted, stacked model for named entity recognition. In Proc. of CoNLL, pages 200--203, 2003.

Cited By

View all
  • (2015)Integration of Text Mining TaxonomiesKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-662-46549-3_3(39-55)Online publication date: 25-Apr-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services
December 2013
753 pages
ISBN:9781450321136
DOI:10.1145/2539150
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • @WAS: International Organization of Information Integration and Web-based Applications and Services

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Named Entity Recognition
  2. Probabilistic aggregation

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IIWAS '13

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2015)Integration of Text Mining TaxonomiesKnowledge Discovery, Knowledge Engineering and Knowledge Management10.1007/978-3-662-46549-3_3(39-55)Online publication date: 25-Apr-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media