ABSTRACT
Implementing the multilingual Semantic Web vision requires transforming unstructured data in multiple languages from the Document Web into structured data for the multilingual Web of Data. We present the multilingual version of FOX, a knowledge extraction suite which supports this migration by providing named entity recognition based on ensemble learning for five languages. Our evaluation results show that our approach goes beyond the performance of existing named entity recognition systems on all five languages. In our best run, we outperform the state of the art by a gain of 32.38% F1-Score points on a Dutch dataset. More information and a demo can be found at http://fox.aksw.org as well as an extended version of the paper descriping the evaluation in detail.
- R. Amsler. 1989. Research Towards the Development of a Lexical Knowledge Base for Natural Language Processing. SIGIR Forum 23 (1989), 1--2. Google ScholarDigital Library
- Jason Baldridge. 2005. The OpenNLP project. URL: http://opennlp.apache.org/index.html, (accessed 17 May 2017) (2005).Google Scholar
- Darina Benikova, Seid Muhie, Yimam Prabhakaran, and Santhanam Chris Biemann. 2015. C.: GermaNER: Free Open German Named Entity Recognition Tool. In In: Proc. GSCL-2015.Google Scholar
- Lorenz Bühmann, Ricardo Usbeck, and Axel-Cyrille Ngonga Ngomo. 2015. ASSESS --- Automatic Self-Assessment Using Linked Data. In International Semantic Web Conference (ISWC).Google ScholarDigital Library
- Sam Coates-Stephens. 1992. The Analysis and Acquisition of Proper Names for the Understanding of Free Text. Computers and the Humanities 26 (1992), 441--456. Issue 5.Google ScholarCross Ref
- James R. Curran and Stephen Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4. 164--167. Google ScholarDigital Library
- Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics). Google ScholarDigital Library
- Janez Demšar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7 (Dec. 2006), 1--30. Google ScholarDigital Library
- Maud Ehrmann, Guillaume Jacquet, and Ralf Steinberger. 2017. JRC-Names: Multilingual entity name variants and titles as Linked Data. Semantic Web 8, 2 (2017), 283--295.Google ScholarDigital Library
- Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165 (June 2005), 91--134. Issue 1. Google ScholarCross Ref
- Manaal Faruqui and Sebastian Padó. 2010. Training and Evaluating a German Named Entity Recognizer with Semantic Generalization. In Proceedings of KON-VENS 2010. Saarbrücken, Germany.Google Scholar
- Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 363--370. Google ScholarDigital Library
- Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In ACL. 363--370. Google ScholarDigital Library
- Daniel Gerber and Axel-Cyrille Ngonga Ngomo. 2014. From RDF to Natural Language and Back. In Towards the Multilingual Semantic Web. Springer.Google Scholar
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10--18. Google ScholarDigital Library
- Ali Khalili, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2014. conTEXT -- Lightweight Text Analytics using Linked Data. In Extended Semantic Web Conference (ESWC 2014).Google ScholarCross Ref
- Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60.Google Scholar
- Diego Moussallem, Ricardo Usbeck, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2017. MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach. In K-CAP 2017: Knowledge Capture Conference. ACM, 8. Google ScholarDigital Library
- David Nadeau. 2005. Balie---baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques. Technical Report. Technical report, University of Ottawa.Google Scholar
- David Nadeau. 2007. Semi-supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision. Ph.D. Dissertation. Ottawa, Ont., Canada, Canada. AAINR49385. Google ScholarDigital Library
- David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1 (January 2007), 3--26. Publisher: John Benjamins Publishing Company.Google Scholar
- David Nadeau, Peter Turney, and Stan Matwin. 2006. Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Advances in Artificial Intelligence, 266--277. Google ScholarDigital Library
- Axel-Cyrille Ngonga Ngomo, Norman Heino, Klaus Lyko, René Speck, and Martin Kaltenböck. 2011. SCMS - Semantifying Content Management Systems. In ISWC 2011.Google Scholar
- Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran. 2012. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194 (2012), 151--175. Google ScholarDigital Library
- Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. 2006. Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In proceedings of the 21st national conference on Artificial intelligence - Volume 2. AAAI Press, 1400--1405. Google ScholarDigital Library
- R Development Core Team. 2008. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.Google Scholar
- L. Ratinov and D. Roth. 2009. Design Challenges and Misconceptions in Named Entity Recognition. In CoNLL. Google ScholarDigital Library
- Michael Röder, Ricardo Usbeck, René Speck, and Axel-Cyrille Ngonga Ngomo. 2015. CETUS -- A Baseline Approach to Type Extraction. In 1st Open Knowledge Extraction Challenge @ 12th European Semantic Web Conference (ESWC 2015).Google ScholarCross Ref
- G. Sampson. 1989. How Fully Does a Machine-usable Dictionary Cover English Text. Literary and Linguistic Computing 4, 1 (1989).Google Scholar
- Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann. 2015. Automating RDF Dataset Transformation and Enrichment. In 12th Extended Semantic Web Conference, Portorož, Slovenia, 31st May-4th June 2015. Springer. Google ScholarDigital Library
- René Speck and Axel-Cyrille Ngonga Ngomo. 2014. Ensemble Learning for Named Entity Recognition. In The Semantic Web -- ISWC 2014. Lecture Notes in Computer Science, Vol. 8796. Springer International Publishing, 519--534. Google ScholarDigital Library
- René Speck and Axel-Cyrille Ngonga Ngomo. 2014. Named Entity Recognition using FOX. In International Semantic Web Conference 2014 (ISWC2014), Demos & Posters. Google ScholarDigital Library
- René Speck, Michael Röder, Sergio Oramas, Luis Espinosa-Anke, and Axel-Cyrille Ngonga Ngomo. 2017. Open Knowledge Extraction Challenge 2017. In Semantic Web Challenges: Fourth SemWebEval Challenge at ESWC 2017 (Communications in Computer and Information Science). Springer International Publishing.Google Scholar
- Christine Thielen. 1995. An Approach to Proper Name Tagging for German. In In Proceedings of the EACL-95 SIGDAT Workshop.Google Scholar
- Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, and Christina Unger. 2015. HAWK - Hybrid Question Answering over Linked Data. In 12th Extended Semantic Web Conference, Portorož, Slovenia, 31st May-4th June 2015. Google ScholarDigital Library
- Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder, Daniel Gerber, SandroAthaide Coelho, Sören Auer, and Andreas Both. 2014. AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data. In The Semantic Web -- ISWC 2014, Peter Mika, Tania Tudorache, Abraham Bernstein, Chris Welty, Craig Knoblock, Denny Vrandečić, Paul Groth, Natasha Noy, Krzysztof Janowicz, and Carole Goble (Eds.). Lecture Notes in Computer Science, Vol. 8796. Springer International Publishing, 457--471. Google ScholarDigital Library
- Ricardo Usbeck, Michael Röder, Peter Haase, Artem Kozlov, Muhammad Saleem, and Axel-Cyrille Ngonga Ngomo. 2016. Requirements to Modern Semantic Search Engines. In KESW.Google Scholar
- D. Walker and R. Amsler. 1986. The Use of Machine-readable Dictionaries in Sublanguage Analysis. Analysing Language in Restricted Domains (1986).Google Scholar
- GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of ACL. 473--480. Google ScholarDigital Library
Index Terms
- Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data
Recommendations
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Ensemble Learning for Named Entity Recognition
The Semantic Web – ISWC 2014AbstractA considerable portion of the information on the Web is still only available in unstructured form. Implementing the vision of the Semantic Web thus requires transforming this unstructured data into structured data. One key step during this process ...
Named entity recognition an aid to improve multilingual entity filling in language-independent approach
IKM4DR '12: Proceedings of the first workshop on Information and knowledge management for developing regionThis paper details the approach to identify Named Entities (NEs) from a large non-English corpus and associate them with appropriate tags, requiring minimal human intervention and no linguistic expertise. The main objective in this paper is to focus on ...
Comments