Predicting Poll Trends Using Twitter and Multivariate Time-Series Classification

Mirowski, Tom; Roychoudhury, Shoumik; Zhou, Fang; Obradovic, Zoran

doi:10.1007/978-3-319-47880-7_17

Tom Mirowski¹⁵,
Shoumik Roychoudhury¹⁵,
Fang Zhou¹⁵ &
…
Zoran Obradovic¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10046))

Included in the following conference series:

International Conference on Social Informatics

3050 Accesses
2 Citations

Abstract

Social media outlets, such as Twitter, provide invaluable information for understanding the social and political climate surrounding particular issues. Millions of people who vary in age, social class, and political beliefs come together in conversation. However, this information poses challenges to making inferences from these tweets. Using the tweets from the 2016 U.S. Presidential campaign, one main research question is addressed in this work. That is, can accurate predictions be made detecting changes in a political candidate’s poll score trends utilizing tweets created during their campaign? The novelty of this work is that we formulate the problem as a multivariate time-series classification problem, which fits the temporal nature of tweets, rather than as a traditional attribute-based classification. Features that represent various aspects of support for (or against) a candidate are tracked on an hour-by-hour basis. Together these form multivariate time-series. One commonly used approach to this problem is based on the majority voting scheme. This method assumes the univariate time-series from different features have equal importance. To alleviate this issue a weighted shapelet transformation model is proposed. Extensive experiments on over 12 million tweets between November 2015 and January 2016 related to the four primary candidates (Bernie Sanders, Hillary Clinton, Donald Trump and Ted Cruz) indicate that the multivariate time-series approach outperforms traditional attribute-based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bermingham, A., Smeaton, A.F.: On using Twitter to monitor political sentiment and predict election results. In: Sentiment Analysis where AI meets Psychology (SAAIP), p. 2 (2011)
Google Scholar
Gayo-Avello, D.: A meta-analysis of state-of-the-art electoral prediction from Twitter data. Social Science Computer Review, pp. 649–679 (2013)
Google Scholar
Ghalwash, M., Radosavljevic, V., Obradovic, Z.: Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 402–411 (2014)
Google Scholar
Grabocka, J., Schilling, N., Wistuba, M., Schmidt-Thieme, L.: Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 392–401. ACM (2014)
Google Scholar
Graham, T., Jackson, D., Broersma, M.: New platform, old habits? Candidates use of Twitter during the 2010 British and Dutch general election campaigns. New Media Soc. 18(5), 765–783 (2016)
Article Google Scholar
Hills, J., Lines, J., Baranauskas, E., Mapp, J., Bagnall, A.: Classification of time series by shapelet transformation. Data Min. Knowl. Disc. 28(4), 851–881 (2014)
Article MathSciNet MATH Google Scholar
Larsson, A.O., Moe, H.: Studying political microblogging: Twitter users in the 2010 Swedish election campaign. New Media Soc. 14, 729–747 (2012)
Article Google Scholar
Mueen, A., Keogh, E., Young, N.: Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 1154–1162 (2011)
Google Scholar
O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From Tweets to polls: linking text sentiment to public opinion time series. ICWSM 11(122–129), 1–2 (2010)
Google Scholar
Patri, O.P., Sharma, A.B., Chen, H., Jiang, G., Panangadan, A.V., Prasanna, V.K.: Extracting discriminative shapelets from heterogeneous sensor data. In: 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, 27–30 October 2014, pp. 1095–1104 (2014)
Google Scholar
Roychoudhury, S., Ghalwash, M.F., Obradovic, Z.: False alarm suppression in early prediction of cardiac arrhythmia. In: 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–6 (2015)
Google Scholar
Sang, E.T.K., Bos, J.: Predicting the 2011 Dutch senate election results with Twitter. In: Proceedings of the Workshop on Semantic Analysis in Social Media, pp. 53–60. Association for Computational Linguistics (2012)
Google Scholar
Shi, L., Agarwal, N., Agrawal, A., Garg, R., Spoelstra, J.: Predicting us primary elections with Twitter (2012)
Google Scholar
Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Am. Soc. Inform. Sci. Technol. 63(1), 163–173 (2012)
Article Google Scholar
Tumasjan, A., Sprenger, T.O., Sandner, P.G., Welpe, I.M.: Predicting elections with Twitter: what 140 characters reveal about political sentiment. ICWSM 10, 178–185 (2010)
Google Scholar
Xing, Z., Pei, J., Yu, P.S., Wang, K.: Extracting interpretable features for early classification on time series. In: SIAM International Conference on Data Mining, pp. 247–258 (2011)
Google Scholar
Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 947–956. ACM (2009)
Google Scholar

Download references

Acknowledgments

This research was supported in part by NSF BIGDATA grant 14476570 and ONR grant N00014-15-1-2729.

Author information

Authors and Affiliations

Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, USA
Tom Mirowski, Shoumik Roychoudhury, Fang Zhou & Zoran Obradovic

Authors

Tom Mirowski
View author publications
You can also search for this author in PubMed Google Scholar
Shoumik Roychoudhury
View author publications
You can also search for this author in PubMed Google Scholar
Fang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zoran Obradovic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zoran Obradovic .

Editor information

Editors and Affiliations

University of Washington, Seattle, Washington, USA
Emma Spiro
Indiana University, Bloomington, Indiana, USA
Yong-Yeol Ahn

Appendix

Learning Time-Series Classification Model (LTS)

LTS [4] is one of the state-of-the-art univariate time-series classification models. The method discovers short time-series sub-sequences known as shapelets [17], which are local discriminative patterns (or sub-sequences) that can be used to characterize the target class, for determining the time-series class membership. In the LTS model, shapelets are learned jointly with a linear classifier rather than searching over all possible time-series segments. More specifically, the algorithm jointly learns the weights of the classifier hyper-plane as well as the generalized shapelets.

A shapelet of length W is a sub-sequence of an instance of the time-series. There can be at most $ L~-~W~+~1$ sub-sequences, and each can be represented as $\{f^q_{i,j},...,f^q_{i,j+W-1}\}$. K shapelets are initialized using K-Means centroid of all segments.

Equation 4 represents a linear model, where $M_{i,k}$ is the minimum distance between the i-th series in $T^q$ and the k-th shapelet $S^q_k$.

$$\begin{aligned} \hat{Y^q_i} = \beta _0 + \sum _{k = 1}^K M_{i,k}\beta _k~~~~~~~~~~~~~~~~~ \forall i \in \left\{ {1,...,I}\right\} \end{aligned}$$

(4)

The minimum distance $M_{i,k}$ is the predictor in this framework for shapelet learning and can be defined by a soft-minimum function:

$$\begin{aligned} M_{i,k} = \frac{\sum D_{i,k,j}e^{\alpha D_{i,k,j}}}{\sum e^{\alpha D_{i,k,j^\prime }}} \end{aligned}$$

(5)

where $D_{i,k,j}$ is defined as the distance between the $j^{th}$ segment of series i and the $k^{th}$ shapelet given by the formula

$$\begin{aligned} D_{i,k,j} = \frac{1}{W}\sum _{w=1}^W (T^q_{i,j+w-1} - S^q_{k,w})^2 \end{aligned}$$

(6)

Equation 7 shows the regularized objective function, composed of a logistic loss defined by Eq. 8 and the regularization terms.

$$\begin{aligned} argmin_{S,\beta } F(S,W)=argmin_{S,\beta } \sum _{i=1}^{I} \mathcal {L}(Y^q_i,\hat{Y^q_i})+\lambda _\beta ||\beta ||^2 \end{aligned}$$

(7)

$$\begin{aligned} \mathcal {L}(Y^q_i,\hat{Y^q_i}) = -~Y^q_i~ln(\sigma (\hat{Y^q_i})) - (1-Y^q_i)ln(1-\sigma (\hat{Y^q_i})) \end{aligned}$$

(8)

Equation 7 is optimized using a stochastic gradient descent algorithm. The weights $\beta $ and the shapelet $S^q$ are jointly learned to minimize the objective function. Once the model is learned, classifying an unknown instance is simply computing $\hat{Y^q_t}$ for the t-th test instance of the q-th feature and determining the class label via Eq. 9

$$\begin{aligned} \hat{Y^q_t} \leftarrow argmax_{c \in \left\{ 1, -1 \right\} }~\sigma (\hat{Y^q_{t,c}}), \end{aligned}$$

(9)

where $\sigma (\cdot )$ denotes the sigmoid function.

For more details about individual gradient computation of the objective function, the reader is referred to [4].

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mirowski, T., Roychoudhury, S., Zhou, F., Obradovic, Z. (2016). Predicting Poll Trends Using Twitter and Multivariate Time-Series Classification. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-47880-7_17
Published: 23 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47879-1
Online ISBN: 978-3-319-47880-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Poll Trends Using Twitter and Multivariate Time-Series Classification

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation