ABSTRACT
Twitter is a new successful technology of the Web 2.0 genre which is used by millions of people and companies to publish brief messages ("tweets") with the purpose of sharing experiences and/or opinions about a product or service. Due to the huge amount of information available in this type of technology, there is a clear need for new systems that can mine these messages in order to derive information about the collective thinking of twitterers (e.g. for opinion or sentiment analysis). Tweet analysis is a very important task because comments, opinions, suggestions, complaints can be used as marketing strategies or for determining information on a company's reputation. For this purpose, it is necessary to establish whether a tweet refers to a company or not, which is not a straightforward keyword search process as there may be multiple contexts in which a name can be used. The aim of this work is to present and compare a number of different approaches based on clustering that determine whether a given tweet refers to a particular company or not. For this purpose, we have used an enriching methodology in order to improve the representation of tweets and as a consequence the performance of the clustering company tweets task. The obtained results are promising and highlight the difficulty of this task.
- S. Banerjee and T. Pedersen. An adapted lesk algorithm for word sense disambiguation using wordnet. In Proc. of the CICLing 2002 Conference, pages 136--145. LNCS Springer-Verlag, 2002. Google ScholarDigital Library
- S. Banerjee, K. Ramanathan, and A. Gupta. Clustering short texts using wikipedia. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 787--788. ACM, 2007. Google ScholarDigital Library
- A. Cheng and M. Evans. Inside twitter: An in-depth look inside the twitter world. Website, 2009. http://www.sysomos.com/insidetwitter/.Google Scholar
- G. Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Ac, 1994. Google ScholarDigital Library
- K. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11--21, 1972.Google ScholarCross Ref
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD'03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge dicovery and data mining, pages 137--146. ACM, 2003. Google ScholarDigital Library
- J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297. University of California Press, 1967.Google Scholar
- D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarDigital Library
- M. McGiboney. Twitter's tweets smell of success. Website, 2008. http://blog.nielsen.com/nielsenwire/online mobile/twitters- tweet-smell-of-success.Google Scholar
- S. Milstein, A. Chowdhury, G. Hochmuth, B. Lorica, and R. Magoulas. Twitter and the micro-messaging revolution: Communication, connections, and immediacy-140 characters at a time. O'Really Report, 2008.Google Scholar
- D. Pinto. On Clustering and Evaluation of Narrow Domain Short-Text Corpora. PhD thesis, Universidad Politécnica de Valencia, 2008.Google Scholar
- Y. Qiu and H. Frei. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 160--169. ACM, 1993. Google ScholarDigital Library
- G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975. Google ScholarDigital Library
- J. Sankaranarayanan, H. Samet, B. Teitler, M. Lieberman, and J. Sperling. Twitterstand: news in tweets. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42--51. ACM, 2009. Google ScholarDigital Library
- B. Sriram, D. Fuhry, E. Demir, and H. Ferhatosmanoglu. Short text classification in twitter to improve information filtering. In The 33rd ACM SIGIR'10 Conference, pages 42--51. ACM, 2010. Google ScholarDigital Library
- C. van Rijsbergen. Information Retrieval. Butterworths, London, 1979. Google ScholarDigital Library
Index Terms
- On the difficulty of clustering company tweets
Recommendations
A Blogger Reputation Evaluation Model Based on Opinion Analysis
APSCC '10: Proceedings of the 2010 IEEE Asia-Pacific Services Computing ConferenceThis paper proposes a blogger reputation evaluation model based on opinion analysis for blogosphere (namedTOAM). This model not only calculates the semantic opinion of blog comment text, but also takes the reputation of blogger into evaluation and ...
Analyzing and predicting viral tweets
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebTwitter and other microblogging services have become indispensable sources of information in today's web. Understanding the main factors that make certain pieces of information spread quickly in these platforms can be decisive for the analysis of ...
Analysis of Tweets Related to Cyberbullying: Exploring Information Diffusion and Advice Available for Cyberbullying Victims
The use of Twitter, especially by teenagers and young people, has raised the issue of cyberbullying. There is a lack of research into what types of advice and support are available in tweets for cyberbullying victims, and into the features influencing ...
Comments