ABSTRACT
Microblog platforms such as Twitter are being increasingly adopted by Web users, yielding an important source of data for web search and mining applications. Tasks such as Named Entity Recognition are at the core of many of these applications, but the effectiveness of existing tools is seriously compromised when applied to Twitter data, since messages are terse, poorly worded and posted in many different languages. Also, Twitter follows a streaming paradigm, imposing that entities must be recognized in real-time. In view of these challenges and the inappropriateness of existing tools, we propose a novel approach for Named Entity Recognition on Twitter data called FS-NER (Filter-Stream Named Entity Recognition). FS-NER is characterized by the use of filters that process unlabeled Twitter messages, being much more practical than existing supervised CRF-based approaches. Such filters can be combined either in sequence or in parallel in a flexible way. Moreover, because these filters are not language dependent, FS-NER can be applied to different languages without requiring a laborious adaptation. Through a systematic evaluation using three Twitter collections and considering seven types of entity, we show that FS-NER performs 3% better than a CRF-based baseline, besides being orders of magnitude faster and much more practical.
- E. Amigó, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo. WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In Proc of CLEF, 2010.Google Scholar
- G. Crane and A. Jones. The Challenge of Virginia Banks: An Evaluation of Named Entity Analysis in a 19th-Century Newspaper Collection. In Proc. of JCDL, pages 31--40, 2006. Google ScholarDigital Library
- G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel. The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation. In Proc. of LREC, pages 837--840, 2004.Google Scholar
- A. Ekbal and S. Saha. Maximum Entropy Classifier Ensembling using Genetic Algorithm for NER in Bengali. In Proc. of LREC, 2010.Google Scholar
- T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. Annotating named entities in Twitter data with crowdsourcing. In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 80--88, 2010. Google ScholarDigital Library
- K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. In Proc. of ACL (Short Papers), pages 42--47, 2011. Google ScholarDigital Library
- L. Hong, G. Convertino, and E. H. Chi. Language Matters In Twitter: A Large Scale Study. In Proc. of ICWSM, 2011.Google Scholar
- W. Hua, D. T. Huynh, S. Hosseini, J. Lu, and X. Zhou. Information Extraction From Microblogs: A Survey. Int. J. Soft. and Informatics, 6(4):495--522, 2012.Google Scholar
- J. J. Jung. Online Named Entity Recognition Method for Microtexts in Social Networking Services: A Case Study of Twitter. Expert Systems with Applications, 39(9):8066--8070, 2012. Google ScholarDigital Library
- C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. TwiNER: named entity recognition in targeted twitter stream. In Proc. of SIGIR, pages 721--730, 2012. Google ScholarDigital Library
- X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing Named Entities in Tweets. In Proc. of ACL, pages 359--367, 2011. Google ScholarDigital Library
- B. Locke and J. Martin. Named Entity Recognition: Adapting to Microblogging. Technical report, University of Colorado, 2009.Google Scholar
- M. Michelson and S. A. Macskassy. Discovering Users' Topics of Interest on Twitter: a First Look. In Proc. of the Fourth workshop on Analytics for Noisy Unstructured Text Data, pages 73--80, Oct. 2010. Google ScholarDigital Library
- D. Nadeau and S. Sekine. A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes, 30(1):3--26, 2007.Google ScholarCross Ref
- D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora. In Proc. of EMNLP, pages 248--256, 2009. Google ScholarDigital Library
- A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental Study. In Proc. of EMNLP, pages 1524--1534, 2011. Google ScholarDigital Library
- M. Rössler. Using Markov Models for Named Entity Recognition in German Newspapers. In Proc. of the Workshop on Machine Learning Approaches in Computational Linguistics, pages 29--37, 2002.Google Scholar
Index Terms
- FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data
Recommendations
TwiNER: named entity recognition in targeted twitter stream
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalMany private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets with user-...
Towards Hybrid NER: A Study of Content and Crowdsourcing-Related Performance Factors
Proceedings of the 12th European Semantic Web Conference on The Semantic Web. Latest Advances and New Domains - Volume 9088This paper explores the factors that influence the human component in hybrid approaches to named entity recognition NER in microblogs, which combine state-of-the-art automatic techniques with human and crowd computing. We identify a set of content and ...
Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition
In this paper, we describe a two-phase method for biomedical named entity recognition consisting of term boundary detection and biomedical category labeling. The term boundary detection can be defined as a task to assign label sequences to a given ...
Comments