Evaluation of a Feature Set with Word Embeddings to Improve Named Entity Recognition on Tweets

Büyüktopaç, Onur; Acarman, Tankut

doi:10.1007/978-3-030-19738-4_26

Onur Büyüktopaç¹⁷ &
Tankut Acarman¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 977))

Included in the following conference series:

International Conference on Computer Recognition Systems

634 Accesses

Abstract

In this paper, we present the Named Entity Recognition system and we evaluate baseline classifiers. We use tweets as informal and noisy texts including emoticons, abbreviations, which significantly degrade the performance of classifiers. We present the dataset format, the feature set, we evaluate and test each classifier subject to different combinations of features. Finally, we discover the most representative set of features. Our experimental results show that the presented system is reached at 72% level in precision, 69% in recall and 69% in F1 (micro average), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A platform for building Python programs to work with human language data. https://www.nltk.org/. Accessed 8 Mar 2019
A Python library for topic modelling, document indexing and similarity retrieval with large corpora. https://pypi.python.org/pypi/gensim. Accessed 8 Mar 2019
A Python module for machine learning. http://sklearn.org/stable/index.html. Accessed 8 Mar 2019
Webpack Bundle Analyzer. https://github.com/webpack-contrib/webpack-bundle-analyzer. Accessed 8 Mar 2019
Ghosh S, Maitra P, Das D (2016) Feature based approach to named entity recognition and linking for tweets. In: #Microposts
Google Scholar
Godin F, Vandersmissen B, De Neve W, Van de Walle R (2015) Multimedia lab @ ACL WNUT NER shared task: named entity recognition for twitter microposts using distributed word representations. In: Proceedings of the workshop on noisy user-generated text, pp 146–153. Association for Computational Linguistics. https://doi.org/10.18653/v1/W15-4322. http://aclweb.org/anthology/W15-4322
Greenfield K, Caceres RS, Coury M, Geyer K, Gwon Y, Matterer J, Mensch AC, Sahin CS, Simek O (2016) A reverse approach to named entity extraction and linking in microposts. In: #Microposts
Google Scholar
Rizzo G, van Erp M, Plu J, Troncy R (2016) Making sense of microposts (#microposts2016) Named Entity rEcognition and Linking (NEEL) challenge. In: Proceedings of the 6th workshop on ‘Making Sense of Microposts’, pp 50–59
Google Scholar
Taşpınar M, Ganiz MC, Acarman T (2017) A feature based simple machine learning approach with word embeddings to named entity recognition on tweets. In: Frasincar F, Ittoo A, Nguyen LM, Métais E (eds) Natural Language Processing and Information Systems. Springer International Publishing, Cham, pp 254–259
Chapter Google Scholar
Torres-Tramón P, Hromic H, Walsh B, Heravi BR, Hayes C (2016) Kanopy4tweets: entity extraction and linking for twitter. In: #Microposts
Google Scholar
Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology - Volume 1, NAACL 2003, pp 173–180. Association for Computational Linguistics, Stroudsburg. https://doi.org/10.3115/1073445.1073478

Download references

Acknowledgements

The authors gratefully acknowledge the support of Galatasaray University, scientific research support program under grant #18.401.002.

Author information

Authors and Affiliations

Computer Engineering Department, Galatasaray University, Ortaköy, 34349, İstanbul, Turkey
Onur Büyüktopaç & Tankut Acarman

Authors

Onur Büyüktopaç
View author publications
You can also search for this author in PubMed Google Scholar
Tankut Acarman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tankut Acarman .

Editor information

Editors and Affiliations

Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Robert Burduk
Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Marek Kurzynski
Faculty of Electronics, Wroclaw University of Science and Technology, Wrocław, Poland
Michał Wozniak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Büyüktopaç, O., Acarman, T. (2020). Evaluation of a Feature Set with Word Embeddings to Improve Named Entity Recognition on Tweets. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds) Progress in Computer Recognition Systems. CORES 2019. Advances in Intelligent Systems and Computing, vol 977. Springer, Cham. https://doi.org/10.1007/978-3-030-19738-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-19738-4_26
Published: 08 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19737-7
Online ISBN: 978-3-030-19738-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics