Skip to main content

Evaluation of a Feature Set with Word Embeddings to Improve Named Entity Recognition on Tweets

  • Conference paper
  • First Online:
Progress in Computer Recognition Systems (CORES 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 977))

Included in the following conference series:

  • 634 Accesses

Abstract

In this paper, we present the Named Entity Recognition system and we evaluate baseline classifiers. We use tweets as informal and noisy texts including emoticons, abbreviations, which significantly degrade the performance of classifiers. We present the dataset format, the feature set, we evaluate and test each classifier subject to different combinations of features. Finally, we discover the most representative set of features. Our experimental results show that the presented system is reached at 72% level in precision, 69% in recall and 69% in F1 (micro average), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A platform for building Python programs to work with human language data. https://www.nltk.org/. Accessed 8 Mar 2019

  2. A Python library for topic modelling, document indexing and similarity retrieval with large corpora. https://pypi.python.org/pypi/gensim. Accessed 8 Mar 2019

  3. A Python module for machine learning. http://sklearn.org/stable/index.html. Accessed 8 Mar 2019

  4. Webpack Bundle Analyzer. https://github.com/webpack-contrib/webpack-bundle-analyzer. Accessed 8 Mar 2019

  5. Ghosh S, Maitra P, Das D (2016) Feature based approach to named entity recognition and linking for tweets. In: #Microposts

    Google Scholar 

  6. Godin F, Vandersmissen B, De Neve W, Van de Walle R (2015) Multimedia lab @ ACL WNUT NER shared task: named entity recognition for twitter microposts using distributed word representations. In: Proceedings of the workshop on noisy user-generated text, pp 146–153. Association for Computational Linguistics. https://doi.org/10.18653/v1/W15-4322. http://aclweb.org/anthology/W15-4322

  7. Greenfield K, Caceres RS, Coury M, Geyer K, Gwon Y, Matterer J, Mensch AC, Sahin CS, Simek O (2016) A reverse approach to named entity extraction and linking in microposts. In: #Microposts

    Google Scholar 

  8. Rizzo G, van Erp M, Plu J, Troncy R (2016) Making sense of microposts (#microposts2016) Named Entity rEcognition and Linking (NEEL) challenge. In: Proceedings of the 6th workshop on ‘Making Sense of Microposts’, pp 50–59

    Google Scholar 

  9. Taşpınar M, Ganiz MC, Acarman T (2017) A feature based simple machine learning approach with word embeddings to named entity recognition on tweets. In: Frasincar F, Ittoo A, Nguyen LM, Métais E (eds) Natural Language Processing and Information Systems. Springer International Publishing, Cham, pp 254–259

    Chapter  Google Scholar 

  10. Torres-Tramón P, Hromic H, Walsh B, Heravi BR, Hayes C (2016) Kanopy4tweets: entity extraction and linking for twitter. In: #Microposts

    Google Scholar 

  11. Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology - Volume 1, NAACL 2003, pp 173–180. Association for Computational Linguistics, Stroudsburg. https://doi.org/10.3115/1073445.1073478

Download references

Acknowledgements

The authors gratefully acknowledge the support of Galatasaray University, scientific research support program under grant #18.401.002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tankut Acarman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Büyüktopaç, O., Acarman, T. (2020). Evaluation of a Feature Set with Word Embeddings to Improve Named Entity Recognition on Tweets. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds) Progress in Computer Recognition Systems. CORES 2019. Advances in Intelligent Systems and Computing, vol 977. Springer, Cham. https://doi.org/10.1007/978-3-030-19738-4_26

Download citation

Publish with us

Policies and ethics