skip to main content
10.1145/2600428.2609612acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Economically-efficient sentiment stream analysis

Published:03 July 2014Publication History

ABSTRACT

Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search engines. A prominent approach to sentiment analysis is based on the application of classification techniques, that is, content is classified according to the attitude of the writer. A major challenge, however, is that Twitter follows the data stream model, and thus classifiers must operate with limited resources, including labeled data and time for building classification models. Also challenging is the fact that sentiment distribution may change as the stream evolves. In this paper we address these challenges by proposing algorithms that select relevant training instances at each time step, so that training sets are kept small while providing to the classifier the capabilities to suit itself to, and to recover itself from, different types of sentiment drifts. Simultaneously providing capabilities to the classifier, however, is a conflicting-objective problem, and our proposed algorithms employ basic notions of Economics in order to balance both capabilities. We performed the analysis of events that reverberated on Twitter, and the comparison against the state-of-the-art reveals improvements both in terms of error reduction (up to 14%) and reduction of training resources (by orders of magnitude).

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates and B. R-Neto. Modern Information Retrieval. Addison-Wesley-Longman, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bayardo, B. Goethals, and M. Zaki, editors. Workshop on Frequent Itemset Mining Implementations, volume 126, 2004.Google ScholarGoogle Scholar
  4. A. Bifet and E. Frank. Sentiment knowledge discovery in twitter streaming data. In Disc. Science, pages 1--15, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bifet, E. Frank, G. Holmes, and B. Pfahringer. Ensembles of restricted hoeffding trees. TIST, 3(2):30:1--30:20, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bifet and R. Gavaldà. Learning from time-changing data with adaptive windowing. In SDM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Bifet and R. Gavaldà. Adaptive learning from evolving data streams. In IDA, pages 249--260, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive online analysis. JMLR, 11:1601--1604, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Bifet, G. Holmes, B. Pfahringer, and E. Frank. Fast perceptron decision tree learning from evolving data streams. In PAKDD, pages 299--310, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà. Detecting sentiment change in twitter streaming data. JMLR, 17:5--11, 2011.Google ScholarGoogle Scholar
  11. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and regression trees. Wadsworth Intl., 1984.Google ScholarGoogle Scholar
  13. J. Chipman. Compensation principle. In S. N. Durlauf and L. E. Blume, editors, The New Palgrave Dictionary of Economics. Palgrave Macmillan, 2008.Google ScholarGoogle Scholar
  14. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Durstenfeld. Algorithm 235: Random permutation. Commun. ACM, 7(7):420, 1964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Feng, F. Chen, and Y. Yao. A concept similarity based data stream classification model. Journal of Information & Computational Science, 10(4):949--957, 2013.Google ScholarGoogle Scholar
  17. J. Gama, R. S. ao, and P. Rodrigues. Issues in evaluation of stream learning algorithms. In SIGKDD, page 329, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowledge Discovery, 8(1):53--87, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Hicks. The foundations of welfare economics. The Economic Journal, 49(196):696--712, 1939.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. Hof. Real-time advertising has arrived, thanks to Oreo and The Super Bowl, April 2013. www.forbes.com/.Google ScholarGoogle Scholar
  21. C. Jin, K. Yi, L. Chen, J. Yu, and X. Lin. Sliding-window top-k queries on uncertain streams. VLDB J., 19(3):411--435, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Kaldor. Welfare propositions in economics and interpersonal comparisons of utility. The Economic Journal, 49(195):549--552, 1939.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Klinkenberg. Learning drifting concepts: Example selection vs. example weighting. Intell. Data Anal., 8(3), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Koychev. Gradual forgetting for adaptation to concept drift. In ECAI, pages 101--106, 2000.Google ScholarGoogle Scholar
  25. M. Masud, J. Gao, L. Khan, J. Han, and B. Thuraisingham. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In ICDM, pages 929--934, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Moreira, J. dos Santos, and A. Veloso. Learning to rank similar apparel styles with economically-efficient rule-based active learning. In ICMR, pages 361--369, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. N. nez, R. Fidalgo, and R. Morales. Learning in environments with unknown dynamics: Towards more robust concept learners. JMLR, 8, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. Palda. Pareto's Republic and the new Science of Peace. Cooper-Wolfling, 2011.Google ScholarGoogle Scholar
  29. M. Ribeiro, A. Lacerda, A. Veloso, and N. Ziviani. Pareto-efficient hybridization for multi-objective recommender systems. In RecSys, pages 19--26, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. I. Santana, J. Gomide, A. Veloso, W. M. Jr., and R. Ferreira. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In SIGIR, pages 475--484. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Torres, J. Ruiz, and Y. Sarabia. Classification model for data streams based on similarity. In IEA, pages 1--9, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Veloso, W. M. Jr., and M. Zaki. Lazy associative classification. In ICDM, pages 645--654, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Veloso, W. Meira Jr., M. Gonçalves, H. de Almeida, and M. Zaki. Calibrated lazy associative classification. Inf. Sci., 181(13):2656--2670, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Veloso, M. Otey, S. Parthasarathy, and W. Meira Jr. Parallel and distributed frequent itemset mining on dynamic datasets. In HiPC, pages 184--193, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  35. I.vZliobait\.e. Learning under concept drift: an overview. CoRR, abs/1010.4784, 2010.Google ScholarGoogle Scholar
  36. I.vZliobait\.e, A. Bifet, G. Holmes, and B. Pfahringer. MOA concept drift active learning strategies for streaming data. JMLR, 17:48--55, 2011.Google ScholarGoogle Scholar
  37. I.vZliobait\.e, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with evolving streaming data. In Machine Learning and Knowledge Discovery in Databases, volume 6913, pages 597--612. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. I.vZliobait\.e, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with drifting streaming data. IEEE Trans. on Neural Networks and Learning Systems, PP(99):1--1, 2013.Google ScholarGoogle Scholar
  39. M. Zaki and K. Gouda. Fast vertical mining using diffsets. In SIGKDD, pages 326--335, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In SIGKDD, pages 283--286, 1997.Google ScholarGoogle Scholar
  41. X. Zhu, P. Zhang, X. Lin, and Y. Shi. Active learning from stream data using optimal weight classifier ensemble. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(6):1607--1621, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD, pages 336--345, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Economically-efficient sentiment stream analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
          July 2014
          1330 pages
          ISBN:9781450322577
          DOI:10.1145/2600428

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 July 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader