short-paper

Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data

Authors:
René Speck

Data Science Group, University of Leipzig, Leipzig, Germany

Data Science Group, University of Leipzig, Leipzig, Germany
View Profile

,
Axel-Cyrille Ngonga Ngomo

Data Science Group, University of Paderborn, Paderborn, Germany

Data Science Group, University of Paderborn, Paderborn, Germany
View Profile

K-CAP '17: Proceedings of the 9th Knowledge Capture ConferenceDecember 2017Article No.: 26Pages 1–4https://doi.org/10.1145/3148011.3154471

Published:04 December 2017Publication History

K-CAP '17: Proceedings of the 9th Knowledge Capture Conference

Pages 1–4

ABSTRACT

Implementing the multilingual Semantic Web vision requires transforming unstructured data in multiple languages from the Document Web into structured data for the multilingual Web of Data. We present the multilingual version of FOX, a knowledge extraction suite which supports this migration by providing named entity recognition based on ensemble learning for five languages. Our evaluation results show that our approach goes beyond the performance of existing named entity recognition systems on all five languages. In our best run, we outperform the state of the art by a gain of 32.38% F1-Score points on a Dutch dataset. More information and a demo can be found at http://fox.aksw.org as well as an extended version of the paper descriping the evaluation in detail.

References

R. Amsler. 1989. Research Towards the Development of a Lexical Knowledge Base for Natural Language Processing. SIGIR Forum 23 (1989), 1--2. Google ScholarDigital Library
Jason Baldridge. 2005. The OpenNLP project. URL: http://opennlp.apache.org/index.html, (accessed 17 May 2017) (2005).Google Scholar
Darina Benikova, Seid Muhie, Yimam Prabhakaran, and Santhanam Chris Biemann. 2015. C.: GermaNER: Free Open German Named Entity Recognition Tool. In In: Proc. GSCL-2015.Google Scholar
Lorenz Bühmann, Ricardo Usbeck, and Axel-Cyrille Ngonga Ngomo. 2015. ASSESS --- Automatic Self-Assessment Using Linked Data. In International Semantic Web Conference (ISWC).Google ScholarDigital Library
Sam Coates-Stephens. 1992. The Analysis and Acquisition of Proper Names for the Understanding of Free Text. Computers and the Humanities 26 (1992), 441--456. Issue 5.Google ScholarCross Ref
James R. Curran and Stephen Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4. 164--167. Google ScholarDigital Library
Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics). Google ScholarDigital Library
Janez Demšar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7 (Dec. 2006), 1--30. Google ScholarDigital Library
Maud Ehrmann, Guillaume Jacquet, and Ralf Steinberger. 2017. JRC-Names: Multilingual entity name variants and titles as Linked Data. Semantic Web 8, 2 (2017), 283--295.Google ScholarDigital Library
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165 (June 2005), 91--134. Issue 1. Google ScholarCross Ref
Manaal Faruqui and Sebastian Padó. 2010. Training and Evaluating a German Named Entity Recognizer with Semantic Generalization. In Proceedings of KON-VENS 2010. Saarbrücken, Germany.Google Scholar
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 363--370. Google ScholarDigital Library
Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In ACL. 363--370. Google ScholarDigital Library
Daniel Gerber and Axel-Cyrille Ngonga Ngomo. 2014. From RDF to Natural Language and Back. In Towards the Multilingual Semantic Web. Springer.Google Scholar
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10--18. Google ScholarDigital Library
Ali Khalili, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2014. conTEXT -- Lightweight Text Analytics using Linked Data. In Extended Semantic Web Conference (ESWC 2014).Google ScholarCross Ref
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60.Google Scholar
Diego Moussallem, Ricardo Usbeck, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2017. MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach. In K-CAP 2017: Knowledge Capture Conference. ACM, 8. Google ScholarDigital Library
David Nadeau. 2005. Balie---baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques. Technical Report. Technical report, University of Ottawa.Google Scholar
David Nadeau. 2007. Semi-supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision. Ph.D. Dissertation. Ottawa, Ont., Canada, Canada. AAINR49385. Google ScholarDigital Library
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1 (January 2007), 3--26. Publisher: John Benjamins Publishing Company.Google Scholar
David Nadeau, Peter Turney, and Stan Matwin. 2006. Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Advances in Artificial Intelligence, 266--277. Google ScholarDigital Library
Axel-Cyrille Ngonga Ngomo, Norman Heino, Klaus Lyko, René Speck, and Martin Kaltenböck. 2011. SCMS - Semantifying Content Management Systems. In ISWC 2011.Google Scholar
Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran. 2012. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194 (2012), 151--175. Google ScholarDigital Library
Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. 2006. Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In proceedings of the 21st national conference on Artificial intelligence - Volume 2. AAAI Press, 1400--1405. Google ScholarDigital Library
R Development Core Team. 2008. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.Google Scholar
L. Ratinov and D. Roth. 2009. Design Challenges and Misconceptions in Named Entity Recognition. In CoNLL. Google ScholarDigital Library
Michael Röder, Ricardo Usbeck, René Speck, and Axel-Cyrille Ngonga Ngomo. 2015. CETUS -- A Baseline Approach to Type Extraction. In 1st Open Knowledge Extraction Challenge @ 12th European Semantic Web Conference (ESWC 2015).Google ScholarCross Ref
G. Sampson. 1989. How Fully Does a Machine-usable Dictionary Cover English Text. Literary and Linguistic Computing 4, 1 (1989).Google Scholar
Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann. 2015. Automating RDF Dataset Transformation and Enrichment. In 12th Extended Semantic Web Conference, Portorož, Slovenia, 31st May-4th June 2015. Springer. Google ScholarDigital Library
René Speck and Axel-Cyrille Ngonga Ngomo. 2014. Ensemble Learning for Named Entity Recognition. In The Semantic Web -- ISWC 2014. Lecture Notes in Computer Science, Vol. 8796. Springer International Publishing, 519--534. Google ScholarDigital Library
René Speck and Axel-Cyrille Ngonga Ngomo. 2014. Named Entity Recognition using FOX. In International Semantic Web Conference 2014 (ISWC2014), Demos & Posters. Google ScholarDigital Library
René Speck, Michael Röder, Sergio Oramas, Luis Espinosa-Anke, and Axel-Cyrille Ngonga Ngomo. 2017. Open Knowledge Extraction Challenge 2017. In Semantic Web Challenges: Fourth SemWebEval Challenge at ESWC 2017 (Communications in Computer and Information Science). Springer International Publishing.Google Scholar
Christine Thielen. 1995. An Approach to Proper Name Tagging for German. In In Proceedings of the EACL-95 SIGDAT Workshop.Google Scholar
Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, and Christina Unger. 2015. HAWK - Hybrid Question Answering over Linked Data. In 12th Extended Semantic Web Conference, Portorož, Slovenia, 31st May-4th June 2015. Google ScholarDigital Library
Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder, Daniel Gerber, SandroAthaide Coelho, Sören Auer, and Andreas Both. 2014. AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data. In The Semantic Web -- ISWC 2014, Peter Mika, Tania Tudorache, Abraham Bernstein, Chris Welty, Craig Knoblock, Denny Vrandečić, Paul Groth, Natasha Noy, Krzysztof Janowicz, and Carole Goble (Eds.). Lecture Notes in Computer Science, Vol. 8796. Springer International Publishing, 457--471. Google ScholarDigital Library
Ricardo Usbeck, Michael Röder, Peter Haase, Artem Kozlov, Muhammad Saleem, and Axel-Cyrille Ngonga Ngomo. 2016. Requirements to Modern Semantic Search Engines. In KESW.Google Scholar
D. Walker and R. Amsler. 1986. The Use of Machine-readable Dictionaries in Sublanguage Analysis. Analysing Language in Restricted Domains (1986).Google Scholar
GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of ACL. 473--480. Google ScholarDigital Library

Index Terms

Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction

Recommendations

Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Read More
Ensemble Learning for Named Entity Recognition
The Semantic Web – ISWC 2014
Abstract
A considerable portion of the information on the Web is still only available in unstructured form. Implementing the vision of the Semantic Web thus requires transforming this unstructured data into structured data. One key step during this process ...
Read More
Named entity recognition an aid to improve multilingual entity filling in language-independent approach
IKM4DR '12: Proceedings of the first workshop on Information and knowledge management for developing region

This paper details the approach to identify Named Entities (NEs) from a large non-English corpus and associate them with appropriate tags, requiring minimal human intervention and no linguistic expertise. The main objective in this paper is to focus on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

K-CAP '17: Proceedings of the 9th Knowledge Capture Conference
December 2017
271 pages
ISBN:9781450355537
DOI:10.1145/3148011

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 December 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Ensemble Learning
Multilingual
Named Entity Recognition
Semantic Web
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate55of198submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 115
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data

K-CAP '17: Proceedings of the 9th Knowledge Capture Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning multilingual named entity recognition from Wikipedia

Ensemble Learning for Named Entity Recognition

Named entity recognition an aid to improve multilingual entity filling in language-independent approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data

K-CAP '17: Proceedings of the 9th Knowledge Capture Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning multilingual named entity recognition from Wikipedia

Ensemble Learning for Named Entity Recognition

Named entity recognition an aid to improve multilingual entity filling in language-independent approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media