skip to main content
10.1145/3148011.3154471acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
short-paper

Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data

Published:04 December 2017Publication History

ABSTRACT

Implementing the multilingual Semantic Web vision requires transforming unstructured data in multiple languages from the Document Web into structured data for the multilingual Web of Data. We present the multilingual version of FOX, a knowledge extraction suite which supports this migration by providing named entity recognition based on ensemble learning for five languages. Our evaluation results show that our approach goes beyond the performance of existing named entity recognition systems on all five languages. In our best run, we outperform the state of the art by a gain of 32.38% F1-Score points on a Dutch dataset. More information and a demo can be found at http://fox.aksw.org as well as an extended version of the paper descriping the evaluation in detail.

References

  1. R. Amsler. 1989. Research Towards the Development of a Lexical Knowledge Base for Natural Language Processing. SIGIR Forum 23 (1989), 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jason Baldridge. 2005. The OpenNLP project. URL: http://opennlp.apache.org/index.html, (accessed 17 May 2017) (2005).Google ScholarGoogle Scholar
  3. Darina Benikova, Seid Muhie, Yimam Prabhakaran, and Santhanam Chris Biemann. 2015. C.: GermaNER: Free Open German Named Entity Recognition Tool. In In: Proc. GSCL-2015.Google ScholarGoogle Scholar
  4. Lorenz Bühmann, Ricardo Usbeck, and Axel-Cyrille Ngonga Ngomo. 2015. ASSESS --- Automatic Self-Assessment Using Linked Data. In International Semantic Web Conference (ISWC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sam Coates-Stephens. 1992. The Analysis and Acquisition of Proper Names for the Understanding of Free Text. Computers and the Humanities 26 (1992), 441--456. Issue 5.Google ScholarGoogle ScholarCross RefCross Ref
  6. James R. Curran and Stephen Clark. 2003. Language independent NER using a maximum entropy tagger. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4. 164--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Joachim Daiber, Max Jakob, Chris Hokamp, and Pablo N. Mendes. 2013. Improving Efficiency and Accuracy in Multilingual Entity Extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Janez Demšar. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 7 (Dec. 2006), 1--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Maud Ehrmann, Guillaume Jacquet, and Ralf Steinberger. 2017. JRC-Names: Multilingual entity name variants and titles as Linked Data. Semantic Web 8, 2 (2017), 283--295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165 (June 2005), 91--134. Issue 1. Google ScholarGoogle ScholarCross RefCross Ref
  11. Manaal Faruqui and Sebastian Padó. 2010. Training and Evaluating a German Named Entity Recognizer with Semantic Generalization. In Proceedings of KON-VENS 2010. Saarbrücken, Germany.Google ScholarGoogle Scholar
  12. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In ACL. 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Daniel Gerber and Axel-Cyrille Ngonga Ngomo. 2014. From RDF to Natural Language and Back. In Towards the Multilingual Semantic Web. Springer.Google ScholarGoogle Scholar
  15. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ali Khalili, Sören Auer, and Axel-Cyrille Ngonga Ngomo. 2014. conTEXT -- Lightweight Text Analytics using Linked Data. In Extended Semantic Web Conference (ESWC 2014).Google ScholarGoogle ScholarCross RefCross Ref
  17. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations. 55--60.Google ScholarGoogle Scholar
  18. Diego Moussallem, Ricardo Usbeck, Michael Röder, and Axel-Cyrille Ngonga Ngomo. 2017. MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach. In K-CAP 2017: Knowledge Capture Conference. ACM, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Nadeau. 2005. Balie---baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques. Technical Report. Technical report, University of Ottawa.Google ScholarGoogle Scholar
  20. David Nadeau. 2007. Semi-supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision. Ph.D. Dissertation. Ottawa, Ont., Canada, Canada. AAINR49385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes 30, 1 (January 2007), 3--26. Publisher: John Benjamins Publishing Company.Google ScholarGoogle Scholar
  22. David Nadeau, Peter Turney, and Stan Matwin. 2006. Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. Advances in Artificial Intelligence, 266--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Axel-Cyrille Ngonga Ngomo, Norman Heino, Klaus Lyko, René Speck, and Martin Kaltenböck. 2011. SCMS - Semantifying Content Management Systems. In ISWC 2011.Google ScholarGoogle Scholar
  24. Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, and James R. Curran. 2012. Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence 194 (2012), 151--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. 2006. Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge. In proceedings of the 21st national conference on Artificial intelligence - Volume 2. AAAI Press, 1400--1405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R Development Core Team. 2008. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.Google ScholarGoogle Scholar
  27. L. Ratinov and D. Roth. 2009. Design Challenges and Misconceptions in Named Entity Recognition. In CoNLL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Michael Röder, Ricardo Usbeck, René Speck, and Axel-Cyrille Ngonga Ngomo. 2015. CETUS -- A Baseline Approach to Type Extraction. In 1st Open Knowledge Extraction Challenge @ 12th European Semantic Web Conference (ESWC 2015).Google ScholarGoogle ScholarCross RefCross Ref
  29. G. Sampson. 1989. How Fully Does a Machine-usable Dictionary Cover English Text. Literary and Linguistic Computing 4, 1 (1989).Google ScholarGoogle Scholar
  30. Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann. 2015. Automating RDF Dataset Transformation and Enrichment. In 12th Extended Semantic Web Conference, Portorož, Slovenia, 31st May-4th June 2015. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. René Speck and Axel-Cyrille Ngonga Ngomo. 2014. Ensemble Learning for Named Entity Recognition. In The Semantic Web -- ISWC 2014. Lecture Notes in Computer Science, Vol. 8796. Springer International Publishing, 519--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. René Speck and Axel-Cyrille Ngonga Ngomo. 2014. Named Entity Recognition using FOX. In International Semantic Web Conference 2014 (ISWC2014), Demos & Posters. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. René Speck, Michael Röder, Sergio Oramas, Luis Espinosa-Anke, and Axel-Cyrille Ngonga Ngomo. 2017. Open Knowledge Extraction Challenge 2017. In Semantic Web Challenges: Fourth SemWebEval Challenge at ESWC 2017 (Communications in Computer and Information Science). Springer International Publishing.Google ScholarGoogle Scholar
  34. Christine Thielen. 1995. An Approach to Proper Name Tagging for German. In In Proceedings of the EACL-95 SIGDAT Workshop.Google ScholarGoogle Scholar
  35. Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, and Christina Unger. 2015. HAWK - Hybrid Question Answering over Linked Data. In 12th Extended Semantic Web Conference, Portorož, Slovenia, 31st May-4th June 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder, Daniel Gerber, SandroAthaide Coelho, Sören Auer, and Andreas Both. 2014. AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data. In The Semantic Web -- ISWC 2014, Peter Mika, Tania Tudorache, Abraham Bernstein, Chris Welty, Craig Knoblock, Denny Vrandečić, Paul Groth, Natasha Noy, Krzysztof Janowicz, and Carole Goble (Eds.). Lecture Notes in Computer Science, Vol. 8796. Springer International Publishing, 457--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ricardo Usbeck, Michael Röder, Peter Haase, Artem Kozlov, Muhammad Saleem, and Axel-Cyrille Ngonga Ngomo. 2016. Requirements to Modern Semantic Search Engines. In KESW.Google ScholarGoogle Scholar
  38. D. Walker and R. Amsler. 1986. The Use of Machine-readable Dictionaries in Sublanguage Analysis. Analysing Language in Restricted Domains (1986).Google ScholarGoogle Scholar
  39. GuoDong Zhou and Jian Su. 2002. Named entity recognition using an HMM-based chunk tagger. In Proceedings of ACL. 473--480. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ensemble Learning of Named Entity Recognition Algorithms using Multilayer Perceptron for the Multilingual Web of Data

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            K-CAP '17: Proceedings of the 9th Knowledge Capture Conference
            December 2017
            271 pages
            ISBN:9781450355537
            DOI:10.1145/3148011

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 4 December 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • short-paper
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate55of198submissions,28%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader