research-article

Economically-efficient sentiment stream analysis

Authors:
Roberto Lourenco Jr.

UFMG, Belo Horizonte, Brazil

UFMG, Belo Horizonte, Brazil
View Profile

,
Adriano Veloso

UFMG, Belo Horizonte, Brazil

UFMG, Belo Horizonte, Brazil
View Profile

,
Adriano Pereira

UFMG, Belo Horizonte, Brazil

UFMG, Belo Horizonte, Brazil
View Profile

,
Wagner Meira Jr.

UFMG, Belo Horizonte, Brazil

UFMG, Belo Horizonte, Brazil
View Profile

,
Renato Ferreira

UFMG, Belo Horizonte, Brazil

UFMG, Belo Horizonte, Brazil
View Profile

,
Srinivasan Parthasarathy

The Ohio State University, Columbus, USA

The Ohio State University, Columbus, USA
View Profile

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalJuly 2014Pages 637–646https://doi.org/10.1145/2600428.2609612

Published:03 July 2014Publication History

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Pages 637–646

ABSTRACT

Text-based social media channels, such as Twitter, produce torrents of opinionated data about the most diverse topics and entities. The analysis of such data (aka. sentiment analysis) is quickly becoming a key feature in recommender systems and search engines. A prominent approach to sentiment analysis is based on the application of classification techniques, that is, content is classified according to the attitude of the writer. A major challenge, however, is that Twitter follows the data stream model, and thus classifiers must operate with limited resources, including labeled data and time for building classification models. Also challenging is the fact that sentiment distribution may change as the stream evolves. In this paper we address these challenges by proposing algorithms that select relevant training instances at each time step, so that training sets are kept small while providing to the classifier the capabilities to suit itself to, and to recover itself from, different types of sentiment drifts. Simultaneously providing capabilities to the classifier, however, is a conflicting-objective problem, and our proposed algorithms employ basic notions of Economics in order to balance both capabilities. We performed the analysis of events that reverberated on Twitter, and the comparison against the state-of-the-art reveals improvements both in terms of error reduction (up to 14%) and reduction of training resources (by orders of magnitude).

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, pages 207--216. ACM, 1993. Google ScholarDigital Library
R. Baeza-Yates and B. R-Neto. Modern Information Retrieval. Addison-Wesley-Longman, 1999. Google ScholarDigital Library
R. Bayardo, B. Goethals, and M. Zaki, editors. Workshop on Frequent Itemset Mining Implementations, volume 126, 2004.Google Scholar
A. Bifet and E. Frank. Sentiment knowledge discovery in twitter streaming data. In Disc. Science, pages 1--15, 2010. Google ScholarDigital Library
A. Bifet, E. Frank, G. Holmes, and B. Pfahringer. Ensembles of restricted hoeffding trees. TIST, 3(2):30:1--30:20, 2012. Google ScholarDigital Library
A. Bifet and R. Gavaldà. Learning from time-changing data with adaptive windowing. In SDM, 2007.Google ScholarCross Ref
A. Bifet and R. Gavaldà. Adaptive learning from evolving data streams. In IDA, pages 249--260, 2009. Google ScholarDigital Library
A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. MOA: Massive online analysis. JMLR, 11:1601--1604, 2010. Google ScholarDigital Library
A. Bifet, G. Holmes, B. Pfahringer, and E. Frank. Fast perceptron decision tree learning from evolving data streams. In PAKDD, pages 299--310, 2010. Google ScholarDigital Library
A. Bifet, G. Holmes, B. Pfahringer, and R. Gavaldà. Detecting sentiment change in twitter streaming data. JMLR, 17:5--11, 2011.Google Scholar
S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and regression trees. Wadsworth Intl., 1984.Google Scholar
J. Chipman. Compensation principle. In S. N. Durlauf and L. E. Blume, editors, The New Palgrave Dictionary of Economics. Palgrave Macmillan, 2008.Google Scholar
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
R. Durstenfeld. Algorithm 235: Random permutation. Commun. ACM, 7(7):420, 1964. Google ScholarDigital Library
L. Feng, F. Chen, and Y. Yao. A concept similarity based data stream classification model. Journal of Information & Computational Science, 10(4):949--957, 2013.Google Scholar
J. Gama, R. S. ao, and P. Rodrigues. Issues in evaluation of stream learning algorithms. In SIGKDD, page 329, 2009. Google ScholarDigital Library
J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowledge Discovery, 8(1):53--87, 2004. Google ScholarDigital Library
J. Hicks. The foundations of welfare economics. The Economic Journal, 49(196):696--712, 1939.Google ScholarCross Ref
R. Hof. Real-time advertising has arrived, thanks to Oreo and The Super Bowl, April 2013. www.forbes.com/.Google Scholar
C. Jin, K. Yi, L. Chen, J. Yu, and X. Lin. Sliding-window top-k queries on uncertain streams. VLDB J., 19(3):411--435, 2010. Google ScholarDigital Library
N. Kaldor. Welfare propositions in economics and interpersonal comparisons of utility. The Economic Journal, 49(195):549--552, 1939.Google ScholarCross Ref
R. Klinkenberg. Learning drifting concepts: Example selection vs. example weighting. Intell. Data Anal., 8(3), 2004. Google ScholarDigital Library
I. Koychev. Gradual forgetting for adaptation to concept drift. In ECAI, pages 101--106, 2000.Google Scholar
M. Masud, J. Gao, L. Khan, J. Han, and B. Thuraisingham. A practical approach to classify evolving data streams: Training with limited amount of labeled data. In ICDM, pages 929--934, 2008. Google ScholarDigital Library
M. Moreira, J. dos Santos, and A. Veloso. Learning to rank similar apparel styles with economically-efficient rule-based active learning. In ICMR, pages 361--369, 2014. Google ScholarDigital Library
M. N. nez, R. Fidalgo, and R. Morales. Learning in environments with unknown dynamics: Towards more robust concept learners. JMLR, 8, 2007. Google ScholarDigital Library
F. Palda. Pareto's Republic and the new Science of Peace. Cooper-Wolfling, 2011.Google Scholar
M. Ribeiro, A. Lacerda, A. Veloso, and N. Ziviani. Pareto-efficient hybridization for multi-objective recommender systems. In RecSys, pages 19--26, 2012. Google ScholarDigital Library
I. Santana, J. Gomide, A. Veloso, W. M. Jr., and R. Ferreira. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In SIGIR, pages 475--484. ACM, 2011. Google ScholarDigital Library
D. Torres, J. Ruiz, and Y. Sarabia. Classification model for data streams based on similarity. In IEA, pages 1--9, 2011. Google ScholarDigital Library
A. Veloso, W. M. Jr., and M. Zaki. Lazy associative classification. In ICDM, pages 645--654, 2006. Google ScholarDigital Library
A. Veloso, W. Meira Jr., M. Gonçalves, H. de Almeida, and M. Zaki. Calibrated lazy associative classification. Inf. Sci., 181(13):2656--2670, 2011. Google ScholarDigital Library
A. Veloso, M. Otey, S. Parthasarathy, and W. Meira Jr. Parallel and distributed frequent itemset mining on dynamic datasets. In HiPC, pages 184--193, 2003.Google ScholarCross Ref
I.vZliobait\.e. Learning under concept drift: an overview. CoRR, abs/1010.4784, 2010.Google Scholar
I.vZliobait\.e, A. Bifet, G. Holmes, and B. Pfahringer. MOA concept drift active learning strategies for streaming data. JMLR, 17:48--55, 2011.Google Scholar
I.vZliobait\.e, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with evolving streaming data. In Machine Learning and Knowledge Discovery in Databases, volume 6913, pages 597--612. 2011. Google ScholarDigital Library
I.vZliobait\.e, A. Bifet, B. Pfahringer, and G. Holmes. Active learning with drifting streaming data. IEEE Trans. on Neural Networks and Learning Systems, PP(99):1--1, 2013.Google Scholar
M. Zaki and K. Gouda. Fast vertical mining using diffsets. In SIGKDD, pages 326--335, 2003. Google ScholarDigital Library
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In SIGKDD, pages 283--286, 1997.Google Scholar
X. Zhu, P. Zhang, X. Lin, and Y. Shi. Active learning from stream data using optimal weight classifier ensemble. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(6):1607--1621, 2010. Google ScholarDigital Library
Y. Zhu and D. Shasha. Efficient elastic burst detection in data streams. In KDD, pages 336--345, 2003. Google ScholarDigital Library

Index Terms

Economically-efficient sentiment stream analysis
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval

Recommendations

Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Read More
Sentence compression for aspect-based sentiment analysis

Sentiment analysis, which addresses the computational treatment of opinion, sentiment, and subjectivity in text, has received considerable attention in recent years. In contrast to the traditional coarse-grained sentiment analysis tasks, such as ...
Read More
Using Sort-Union to Enhance Economically-Efficient Sentiment Stream Analysis
CODS '16: Proceedings of the 3rd IKDD Conference on Data Science, 2016

Sentiment drifts due to people changing their opinions instantly on microblogs e.g. Twitter, are a major challenge in sentiment analysis. In this paper, we have developed a method that selects most frequent messages from a relevant message set ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
July 2014
1330 pages
ISBN:9781450322577
DOI:10.1145/2600428
General Chairs:
Shlomo Geva
Queensland University of Technology
,
Andrew Trotman
University of Dunedin
,
Program Chairs:
Peter Bruza
Queensland University of Technology
,
Charles L.A. Clarke
University of Waterloo
,
Kal Järvelin
University of Tampere
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 July 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
economic efficiency
sentiment analysis
streams and drifts
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 621
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Economically-efficient sentiment stream analysis

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Joint sentiment/topic model for sentiment analysis

Sentence compression for aspect-based sentiment analysis

Using Sort-Union to Enhance Economically-Efficient Sentiment Stream Analysis