skip to main content
10.1145/2487788.2488003acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

Published:13 May 2013Publication History

ABSTRACT

Microblog platforms such as Twitter are being increasingly adopted by Web users, yielding an important source of data for web search and mining applications. Tasks such as Named Entity Recognition are at the core of many of these applications, but the effectiveness of existing tools is seriously compromised when applied to Twitter data, since messages are terse, poorly worded and posted in many different languages. Also, Twitter follows a streaming paradigm, imposing that entities must be recognized in real-time. In view of these challenges and the inappropriateness of existing tools, we propose a novel approach for Named Entity Recognition on Twitter data called FS-NER (Filter-Stream Named Entity Recognition). FS-NER is characterized by the use of filters that process unlabeled Twitter messages, being much more practical than existing supervised CRF-based approaches. Such filters can be combined either in sequence or in parallel in a flexible way. Moreover, because these filters are not language dependent, FS-NER can be applied to different languages without requiring a laborious adaptation. Through a systematic evaluation using three Twitter collections and considering seven types of entity, we show that FS-NER performs 3% better than a CRF-based baseline, besides being orders of magnitude faster and much more practical.

References

  1. E. Amigó, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo. WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In Proc of CLEF, 2010.Google ScholarGoogle Scholar
  2. G. Crane and A. Jones. The Challenge of Virginia Banks: An Evaluation of Named Entity Analysis in a 19th-Century Newspaper Collection. In Proc. of JCDL, pages 31--40, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel. The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation. In Proc. of LREC, pages 837--840, 2004.Google ScholarGoogle Scholar
  4. A. Ekbal and S. Saha. Maximum Entropy Classifier Ensembling using Genetic Algorithm for NER in Bengali. In Proc. of LREC, 2010.Google ScholarGoogle Scholar
  5. T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. Annotating named entities in Twitter data with crowdsourcing. In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 80--88, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. In Proc. of ACL (Short Papers), pages 42--47, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Hong, G. Convertino, and E. H. Chi. Language Matters In Twitter: A Large Scale Study. In Proc. of ICWSM, 2011.Google ScholarGoogle Scholar
  8. W. Hua, D. T. Huynh, S. Hosseini, J. Lu, and X. Zhou. Information Extraction From Microblogs: A Survey. Int. J. Soft. and Informatics, 6(4):495--522, 2012.Google ScholarGoogle Scholar
  9. J. J. Jung. Online Named Entity Recognition Method for Microtexts in Social Networking Services: A Case Study of Twitter. Expert Systems with Applications, 39(9):8066--8070, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. TwiNER: named entity recognition in targeted twitter stream. In Proc. of SIGIR, pages 721--730, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing Named Entities in Tweets. In Proc. of ACL, pages 359--367, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Locke and J. Martin. Named Entity Recognition: Adapting to Microblogging. Technical report, University of Colorado, 2009.Google ScholarGoogle Scholar
  13. M. Michelson and S. A. Macskassy. Discovering Users' Topics of Interest on Twitter: a First Look. In Proc. of the Fourth workshop on Analytics for Noisy Unstructured Text Data, pages 73--80, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Nadeau and S. Sekine. A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes, 30(1):3--26, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora. In Proc. of EMNLP, pages 248--256, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental Study. In Proc. of EMNLP, pages 1524--1534, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Rössler. Using Markov Models for Named Entity Recognition in German Newspapers. In Proc. of the Workshop on Machine Learning Approaches in Computational Linguistics, pages 29--37, 2002.Google ScholarGoogle Scholar

Index Terms

  1. FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web
      May 2013
      1636 pages
      ISBN:9781450320382
      DOI:10.1145/2487788

      Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 May 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '13 Companion Paper Acceptance Rate831of1,250submissions,66%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader