poster

On the difficulty of clustering company tweets

Authors:
Fernando Perez-Tellez

Institute of Technology Tallaght Dublin, Dublin, Ireland

Institute of Technology Tallaght Dublin, Dublin, Ireland
View Profile

,
David Pinto

Benemérita Universidad Autónoma de Puebla, Puebla, Mexico

Benemérita Universidad Autónoma de Puebla, Puebla, Mexico
View Profile

,
John Cardiff

Institute of Technology Tallaght Dublin, Dublin, Ireland

Institute of Technology Tallaght Dublin, Dublin, Ireland
View Profile

,
Paolo Rosso

Universidad Politécnica de Valencia, Valenci, Spain

Universidad Politécnica de Valencia, Valenci, Spain
View Profile

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contentsOctober 2010Pages 95–102https://doi.org/10.1145/1871985.1872001

Published:30 October 2010Publication History

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents

Pages 95–102

ABSTRACT

Twitter is a new successful technology of the Web 2.0 genre which is used by millions of people and companies to publish brief messages ("tweets") with the purpose of sharing experiences and/or opinions about a product or service. Due to the huge amount of information available in this type of technology, there is a clear need for new systems that can mine these messages in order to derive information about the collective thinking of twitterers (e.g. for opinion or sentiment analysis). Tweet analysis is a very important task because comments, opinions, suggestions, complaints can be used as marketing strategies or for determining information on a company's reputation. For this purpose, it is necessary to establish whether a tweet refers to a company or not, which is not a straightforward keyword search process as there may be multiple contexts in which a name can be used. The aim of this work is to present and compare a number of different approaches based on clustering that determine whether a given tweet refers to a particular company or not. For this purpose, we have used an enriching methodology in order to improve the representation of tweets and as a consequence the performance of the clustering company tweets task. The obtained results are promising and highlight the difficulty of this task.

References

S. Banerjee and T. Pedersen. An adapted lesk algorithm for word sense disambiguation using wordnet. In Proc. of the CICLing 2002 Conference, pages 136--145. LNCS Springer-Verlag, 2002. Google ScholarDigital Library
S. Banerjee, K. Ramanathan, and A. Gupta. Clustering short texts using wikipedia. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 787--788. ACM, 2007. Google ScholarDigital Library
A. Cheng and M. Evans. Inside twitter: An in-depth look inside the twitter world. Website, 2009. http://www.sysomos.com/insidetwitter/.Google Scholar
G. Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Ac, 1994. Google ScholarDigital Library
K. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11--21, 1972.Google ScholarCross Ref
D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD'03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge dicovery and data mining, pages 137--146. ACM, 2003. Google ScholarDigital Library
J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297. University of California Press, 1967.Google Scholar
D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarDigital Library
M. McGiboney. Twitter's tweets smell of success. Website, 2008. http://blog.nielsen.com/nielsenwire/online mobile/twitters- tweet-smell-of-success.Google Scholar
S. Milstein, A. Chowdhury, G. Hochmuth, B. Lorica, and R. Magoulas. Twitter and the micro-messaging revolution: Communication, connections, and immediacy-140 characters at a time. O'Really Report, 2008.Google Scholar
D. Pinto. On Clustering and Evaluation of Narrow Domain Short-Text Corpora. PhD thesis, Universidad Politécnica de Valencia, 2008.Google Scholar
Y. Qiu and H. Frei. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 160--169. ACM, 1993. Google ScholarDigital Library
G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975. Google ScholarDigital Library
J. Sankaranarayanan, H. Samet, B. Teitler, M. Lieberman, and J. Sperling. Twitterstand: news in tweets. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42--51. ACM, 2009. Google ScholarDigital Library
B. Sriram, D. Fuhry, E. Demir, and H. Ferhatosmanoglu. Short text classification in twitter to improve information filtering. In The 33rd ACM SIGIR'10 Conference, pages 42--51. ACM, 2010. Google ScholarDigital Library
C. van Rijsbergen. Information Retrieval. Butterworths, London, 1979. Google ScholarDigital Library

Index Terms

On the difficulty of clustering company tweets
1. Information systems
  1. Information systems applications

Recommendations

A Blogger Reputation Evaluation Model Based on Opinion Analysis
APSCC '10: Proceedings of the 2010 IEEE Asia-Pacific Services Computing Conference

This paper proposes a blogger reputation evaluation model based on opinion analysis for blogosphere (namedTOAM). This model not only calculates the semantic opinion of blog comment text, but also takes the reputation of blogger into evaluation and ...
Read More
Analyzing and predicting viral tweets
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

Twitter and other microblogging services have become indispensable sources of information in today's web. Understanding the main factors that make certain pieces of information spread quickly in these platforms can be decisive for the analysis of ...
Read More
Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims

The use of Twitter, especially by teenagers and young people, has raised the issue of cyberbullying. There is a lack of research into what types of advice and support are available in tweets for cyberbullying victims, and into the features influencing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents
October 2010
136 pages
ISBN:9781450303866
DOI:10.1145/1871985
General Chairs:
Jose Carlos Cortizo
BrainSins, Spain
,
Francisco M. Carrero
BrainSins, Spain
,
Ivan Cantador
Autonomous University of Madrid, Spain
,
Jose Antonio Troyano
University of Seville, Spain
,
Paolo Rosso
Technical University of Valencia, Spain
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering of tweets
opinion analysis
Qualifiers
- poster
Conference

Acceptance Rates
SMUC '10 Paper Acceptance Rate15of25submissions,60%Overall Acceptance Rate15of25submissions,60%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 443
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the difficulty of clustering company tweets

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Blogger Reputation Evaluation Model Based on Opinion Analysis

Analyzing and predicting viral tweets

Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On the difficulty of clustering company tweets

SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Blogger Reputation Evaluation Model Based on Opinion Analysis

Analyzing and predicting viral tweets

Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media