skip to main content
10.1145/1871985.1872001acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

On the difficulty of clustering company tweets

Authors Info & Claims
Published:30 October 2010Publication History

ABSTRACT

Twitter is a new successful technology of the Web 2.0 genre which is used by millions of people and companies to publish brief messages ("tweets") with the purpose of sharing experiences and/or opinions about a product or service. Due to the huge amount of information available in this type of technology, there is a clear need for new systems that can mine these messages in order to derive information about the collective thinking of twitterers (e.g. for opinion or sentiment analysis). Tweet analysis is a very important task because comments, opinions, suggestions, complaints can be used as marketing strategies or for determining information on a company's reputation. For this purpose, it is necessary to establish whether a tweet refers to a company or not, which is not a straightforward keyword search process as there may be multiple contexts in which a name can be used. The aim of this work is to present and compare a number of different approaches based on clustering that determine whether a given tweet refers to a particular company or not. For this purpose, we have used an enriching methodology in order to improve the representation of tweets and as a consequence the performance of the clustering company tweets task. The obtained results are promising and highlight the difficulty of this task.

References

  1. S. Banerjee and T. Pedersen. An adapted lesk algorithm for word sense disambiguation using wordnet. In Proc. of the CICLing 2002 Conference, pages 136--145. LNCS Springer-Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Banerjee, K. Ramanathan, and A. Gupta. Clustering short texts using wikipedia. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 787--788. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Cheng and M. Evans. Inside twitter: An in-depth look inside the twitter world. Website, 2009. http://www.sysomos.com/insidetwitter/.Google ScholarGoogle Scholar
  4. G. Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Ac, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11--21, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  6. D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD'03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge dicovery and data mining, pages 137--146. ACM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297. University of California Press, 1967.Google ScholarGoogle Scholar
  8. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. McGiboney. Twitter's tweets smell of success. Website, 2008. http://blog.nielsen.com/nielsenwire/online mobile/twitters- tweet-smell-of-success.Google ScholarGoogle Scholar
  10. S. Milstein, A. Chowdhury, G. Hochmuth, B. Lorica, and R. Magoulas. Twitter and the micro-messaging revolution: Communication, connections, and immediacy-140 characters at a time. O'Really Report, 2008.Google ScholarGoogle Scholar
  11. D. Pinto. On Clustering and Evaluation of Narrow Domain Short-Text Corpora. PhD thesis, Universidad Politécnica de Valencia, 2008.Google ScholarGoogle Scholar
  12. Y. Qiu and H. Frei. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 160--169. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Salton, A. Wong, and C. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613--620, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Sankaranarayanan, H. Samet, B. Teitler, M. Lieberman, and J. Sperling. Twitterstand: news in tweets. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 42--51. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Sriram, D. Fuhry, E. Demir, and H. Ferhatosmanoglu. Short text classification in twitter to improve information filtering. In The 33rd ACM SIGIR'10 Conference, pages 42--51. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. van Rijsbergen. Information Retrieval. Butterworths, London, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On the difficulty of clustering company tweets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents
      October 2010
      136 pages
      ISBN:9781450303866
      DOI:10.1145/1871985

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      SMUC '10 Paper Acceptance Rate15of25submissions,60%Overall Acceptance Rate15of25submissions,60%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader