ABSTRACT
Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search engines. A prominent approach to sentiment analysis is based on the application of classification techniques, that is, content is classified according to the attitude of the writer. A major challenge, however, is that Twitter follows the data stream model, and thus classifiers must operate with limited resources, including labeled data and time for building classification models. Also challenging is the fact that sentiment distribution may change as the stream evolves. In this paper we address these challenges by proposing algorithms that select relevant training instances at each time step, so that training sets are kept small while providing to the classifier the capabilities to suit itself to, and to recover itself from, different types of sentiment drifts. Simultaneously providing capabilities to the classifier, however, is a conflicting-objective problem, and our proposed algorithms employ basic notions of Economics in order to balance both capabilities. We performed the analysis of events that reverberated on Twitter, and the comparison against the state-of-the-art reveals improvements both in terms of error reduction (up to 14%) and reduction of training resources (by orders of magnitude).
- R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216. ACM, 1993. Google ScholarDigital Library
- R. Baeza-Yates and B. R-Neto. Modern Information Retrieval. Addison-Wesley-Longman, 1999. Google ScholarDigital Library
- R. Bayardo, B. Goethals, and M. Zaki, editors. Workshop on Frequent Itemset Mining Implementations, volume 126, 2004.Google Scholar
- A. Bifet and E. Frank. Sentiment knowledge discovery in twitter streaming data. In Disc. Science, pages 1--15, 2010. Google ScholarDigital Library
- A. Bifet, E. Frank, G. Holmes, and B. Pfahringer. Ensembles of restricted hoeffding trees. TIST, 3(2):30:1--30:20, 2012. Google ScholarDigital Library
- A. Bifet and R. Gavaldà. Learning from time-changing data with adaptive windowing. In SDM, 2007.Google ScholarCross Ref
- A. Bifet and R. Gavaldà. Adaptive learning from evolving data streams. In IDA, pages 249--260, 2009. Google ScholarDigital Library
- A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive online analysis. JMLR, 11:1601--1604, 2010. Google ScholarDigital Library
- A. Bifet, G. Holmes, B. Pfahringer, and E. Frank. Fast perceptron decision tree learning from evolving data streams. In PAKDD, pages 299--310, 2010. Google ScholarDigital Library
- A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà. Detecting sentiment change in twitter streaming data. JMLR, 17:5--11, 2011.Google Scholar
- S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
- L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and regression trees. Wadsworth Intl., 1984.Google Scholar
- J. Chipman. Compensation principle. In S. N. Durlauf and L. E. Blume, editors, The New Palgrave Dictionary of Economics. Palgrave Macmillan, 2008.Google Scholar
- C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
- R. Durstenfeld. Algorithm 235: Random permutation. Commun. ACM, 7(7):420, 1964. Google ScholarDigital Library
- L. Feng, F. Chen, and Y. Yao. A concept similarity based data stream classification model. Journal of Information & Computational Science, 10(4):949--957, 2013.Google Scholar
- J. Gama, R. S. ao, and P. Rodrigues. Issues in evaluation of stream learning algorithms. In SIGKDD, page 329, 2009. Google ScholarDigital Library
- J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowledge Discovery, 8(1):53--87, 2004. Google ScholarDigital Library
- J. Hicks. The foundations of welfare economics. The Economic Journal, 49(196):696--712, 1939.Google ScholarCross Ref
- R. Hof. Real-time advertising has arrived, thanks to Oreo and The Super Bowl, April 2013. www.forbes.com/.Google Scholar
- C. Jin, K. Yi, L. Chen, J. Yu, and X. Lin. Sliding-window top-k queries on uncertain streams. VLDB J., 19(3):411--435, 2010. Google ScholarDigital Library
- N. Kaldor. Welfare propositions in economics and interpersonal comparisons of utility. The Economic Journal, 49(195):549--552, 1939.Google ScholarCross Ref
- R. Klinkenberg. Learning drifting concepts: Example selection vs. example weighting. Intell. Data Anal., 8(3), 2004. Google ScholarDigital Library
- I. Koychev. Gradual forgetting for adaptation to concept drift. In ECAI, pages 101--106, 2000.Google Scholar
- M. Masud, J. Gao, L. Khan, J. Han, and B. Thuraisingham. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In ICDM, pages 929--934, 2008. Google ScholarDigital Library
- M. Moreira, J. dos Santos, and A. Veloso. Learning to rank similar apparel styles with economically-efficient rule-based active learning. In ICMR, pages 361--369, 2014. Google ScholarDigital Library
- M. N. nez, R. Fidalgo, and R. Morales. Learning in environments with unknown dynamics: Towards more robust concept learners. JMLR, 8, 2007. Google ScholarDigital Library
- F. Palda. Pareto's Republic and the new Science of Peace. Cooper-Wolfling, 2011.Google Scholar
- M. Ribeiro, A. Lacerda, A. Veloso, and N. Ziviani. Pareto-efficient hybridization for multi-objective recommender systems. In RecSys, pages 19--26, 2012. Google ScholarDigital Library
- I. Santana, J. Gomide, A. Veloso, W. M. Jr., and R. Ferreira. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In SIGIR, pages 475--484. ACM, 2011. Google ScholarDigital Library
- D. Torres, J. Ruiz, and Y. Sarabia. Classification model for data streams based on similarity. In IEA, pages 1--9, 2011. Google ScholarDigital Library
- A. Veloso, W. M. Jr., and M. Zaki. Lazy associative classification. In ICDM, pages 645--654, 2006. Google ScholarDigital Library
- A. Veloso, W. Meira Jr., M. Gonçalves, H. de Almeida, and M. Zaki. Calibrated lazy associative classification. Inf. Sci., 181(13):2656--2670, 2011. Google ScholarDigital Library
- A. Veloso, M. Otey, S. Parthasarathy, and W. Meira Jr. Parallel and distributed frequent itemset mining on dynamic datasets. In HiPC, pages 184--193, 2003.Google ScholarCross Ref
- I.vZliobait\.e. Learning under concept drift: an overview. CoRR, abs/1010.4784, 2010.Google Scholar
- I.vZliobait\.e, A. Bifet, G. Holmes, and B. Pfahringer. MOA concept drift active learning strategies for streaming data. JMLR, 17:48--55, 2011.Google Scholar
- I.vZliobait\.e, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with evolving streaming data. In Machine Learning and Knowledge Discovery in Databases, volume 6913, pages 597--612. 2011. Google ScholarDigital Library
- I.vZliobait\.e, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with drifting streaming data. IEEE Trans. on Neural Networks and Learning Systems, PP(99):1--1, 2013.Google Scholar
- M. Zaki and K. Gouda. Fast vertical mining using diffsets. In SIGKDD, pages 326--335, 2003. Google ScholarDigital Library
- M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In SIGKDD, pages 283--286, 1997.Google Scholar
- X. Zhu, P. Zhang, X. Lin, and Y. Shi. Active learning from stream data using optimal weight classifier ensemble. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(6):1607--1621, 2010. Google ScholarDigital Library
- Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD, pages 336--345, 2003. Google ScholarDigital Library
Index Terms
- Economically-efficient sentiment stream analysis
Recommendations
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Sentence compression for aspect-based sentiment analysis
Sentiment analysis, which addresses the computational treatment of opinion, sentiment, and subjectivity in text, has received considerable attention in recent years. In contrast to the traditional coarse-grained sentiment analysis tasks, such as ...
Using Sort-Union to Enhance Economically-Efficient Sentiment Stream Analysis
CODS '16: Proceedings of the 3rd IKDD Conference on Data Science, 2016Sentiment drifts due to people changing their opinions instantly on microblogs e.g. Twitter, are a major challenge in sentiment analysis. In this paper, we have developed a method that selects most frequent messages from a relevant message set ...
Comments