research-article

FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

Authors:
Diego Marinho de Oliveira

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Alberto H.F. Laender

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Adriano Veloso

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
View Profile

,
Altigran S. da Silva

Universidade Federal do Amazonas, Manaus, Brazil

Universidade Federal do Amazonas, Manaus, Brazil
View Profile

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebMay 2013Pages 597–604https://doi.org/10.1145/2487788.2488003

Published:13 May 2013Publication History

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Pages 597–604

ABSTRACT

Microblog platforms such as Twitter are being increasingly adopted by Web users, yielding an important source of data for web search and mining applications. Tasks such as Named Entity Recognition are at the core of many of these applications, but the effectiveness of existing tools is seriously compromised when applied to Twitter data, since messages are terse, poorly worded and posted in many different languages. Also, Twitter follows a streaming paradigm, imposing that entities must be recognized in real-time. In view of these challenges and the inappropriateness of existing tools, we propose a novel approach for Named Entity Recognition on Twitter data called FS-NER (Filter-Stream Named Entity Recognition). FS-NER is characterized by the use of filters that process unlabeled Twitter messages, being much more practical than existing supervised CRF-based approaches. Such filters can be combined either in sequence or in parallel in a flexible way. Moreover, because these filters are not language dependent, FS-NER can be applied to different languages without requiring a laborious adaptation. Through a systematic evaluation using three Twitter collections and considering seven types of entity, we show that FS-NER performs 3% better than a CRF-based baseline, besides being orders of magnitude faster and much more practical.

References

E. Amigó, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo. WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In Proc of CLEF, 2010.Google Scholar
G. Crane and A. Jones. The Challenge of Virginia Banks: An Evaluation of Named Entity Analysis in a 19th-Century Newspaper Collection. In Proc. of JCDL, pages 31--40, 2006. Google ScholarDigital Library
G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and R. Weischedel. The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation. In Proc. of LREC, pages 837--840, 2004.Google Scholar
A. Ekbal and S. Saha. Maximum Entropy Classifier Ensembling using Genetic Algorithm for NER in Bengali. In Proc. of LREC, 2010.Google Scholar
T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau, and M. Dredze. Annotating named entities in Twitter data with crowdsourcing. In Proc. of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 80--88, 2010. Google ScholarDigital Library
K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. In Proc. of ACL (Short Papers), pages 42--47, 2011. Google ScholarDigital Library
L. Hong, G. Convertino, and E. H. Chi. Language Matters In Twitter: A Large Scale Study. In Proc. of ICWSM, 2011.Google Scholar
W. Hua, D. T. Huynh, S. Hosseini, J. Lu, and X. Zhou. Information Extraction From Microblogs: A Survey. Int. J. Soft. and Informatics, 6(4):495--522, 2012.Google Scholar
J. J. Jung. Online Named Entity Recognition Method for Microtexts in Social Networking Services: A Case Study of Twitter. Expert Systems with Applications, 39(9):8066--8070, 2012. Google ScholarDigital Library
C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee. TwiNER: named entity recognition in targeted twitter stream. In Proc. of SIGIR, pages 721--730, 2012. Google ScholarDigital Library
X. Liu, S. Zhang, F. Wei, and M. Zhou. Recognizing Named Entities in Tweets. In Proc. of ACL, pages 359--367, 2011. Google ScholarDigital Library
B. Locke and J. Martin. Named Entity Recognition: Adapting to Microblogging. Technical report, University of Colorado, 2009.Google Scholar
M. Michelson and S. A. Macskassy. Discovering Users' Topics of Interest on Twitter: a First Look. In Proc. of the Fourth workshop on Analytics for Noisy Unstructured Text Data, pages 73--80, Oct. 2010. Google ScholarDigital Library
D. Nadeau and S. Sekine. A Survey of Named Entity Recognition and Classification. Linguisticae Investigationes, 30(1):3--26, 2007.Google ScholarCross Ref
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora. In Proc. of EMNLP, pages 248--256, 2009. Google ScholarDigital Library
A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental Study. In Proc. of EMNLP, pages 1524--1534, 2011. Google ScholarDigital Library
M. Rössler. Using Markov Models for Named Entity Recognition in German Newspapers. In Proc. of the Workshop on Machine Learning Approaches in Computational Linguistics, pages 29--37, 2002.Google Scholar

Index Terms

FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data
1. Information systems
  1. Information systems applications

Recommendations

TwiNER: named entity recognition in targeted twitter stream
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Many private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets with user-...
Read More
Towards Hybrid NER: A Study of Content and Crowdsourcing-Related Performance Factors
Proceedings of the 12th European Semantic Web Conference on The Semantic Web. Latest Advances and New Domains - Volume 9088

This paper explores the factors that influence the human component in hybrid approaches to named entity recognition NER in microblogs, which combine state-of-the-art automatic techniques with human and crowd computing. We identify a set of content and ...
Read More
Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition

In this paper, we describe a two-phase method for biomedical named entity recognition consisting of term boundary detection and biomedical category labeling. The term boundary detection can be defined as a task to assign label sequences to a given ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web
May 2013
1636 pages
ISBN:9781450320382
DOI:10.1145/2487788
General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea
Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crf
fs-ner
named entity recognition
twitter
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '13 Companion Paper Acceptance Rate831of1,250submissions,66%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 315
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FS-NER: a lightweight filter-stream approach to named entity recognition on twitter data

WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

TwiNER: named entity recognition in targeted twitter stream

Towards Hybrid NER: A Study of Content and Crowdsourcing-Related Performance Factors

Experimental Study on a Two Phase Method for Biomedical Named Entity Recognition