Abstract
The complex, ever-shifting landscape of social media can obscure important changes in conversations involving smaller groups. Discovering these subtle shifts in attention to topics can be challenging for algorithms attuned to global topic popularity. We present a novel unsupervised method to identify shifts in high-dimensional textual data. By utilizing a random selection of date-time instances as inflection points in discourse, the method automatically labels the data as before or after a change point and trains a classifier to predict these labels. Next, it fits a mathematical model of classification accuracy to all trial change points to infer the true change points, as well as the fraction of data affected (a proxy for detection confidence). Finally, it splits the data at the detected change and repeats recursively until a stopping criterion is reached. The method beats state-of-the-art change detection algorithms in accuracy, and often has lower time and space complexity. The method identifies meaningful changes in real-world settings, including Twitter conversations about the Covid-19 pandemic and stories posted on Reddit. The method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in high dimensional data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams, R.P., MacKay, D.J.: Bayesian online changepoint detection. arXiv preprint arXiv:0710.3742 (2007)
Alkhodair, S.A., Ding, S.H., Fung, B.C., Liu, J.: Detecting breaking news rumors of emerging topics in social media. Inf. Process. Manag. 57(2), 102018 (2020)
Arlot, S., Celisse, A., Harchaoui, Z.: A kernel multiple change-point algorithm via model selection. JMRL 20(162), 1–56 (2019)
Barber, J.: A generalized likelihood ratio test for coherent change detection in polarimetric SAR. IEEE GRSL 12(9), 1873–1877 (2015)
Bardet, J.M., Kengne, W.C., Wintenberger, O.: Detecting multiple change-points in general causal time series using penalized quasi-likelihood. arXiv preprint: arXiv:1008.0054 (2010)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: ICML, pp. 113–120 (2006)
Chen, E., Lerman, K., Ferrara, E.: Tracking social media discourse about the COVID-19 pandemic: development of a public coronavirus Twitter data set. JPHS 6(2), e19273 (2020)
Fryzlewicz, P., et al.: Wild binary segmentation for multiple change-point detection. Ann. Stat. 42(6), 2243–2281 (2014)
Halko, N.: Finding structure with randomness: stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061 (2009)
Hido, S., Idé, T., Kashima, H., Kubo, H., Matsuzawa, H.: Unsupervised change analysis using supervised learning. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 148–159. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_15
Hodas, N.O., Lerman, K.: How limited visibility and divided attention constrain social contagion. In: SocialCom 2012 (2012)
Jiang, J., Chen, E., Yan, S., Lerman, K., Ferrara, E.: Political polarization drives online conversations about COVID-19 in the United States. Hum. Behav. Emerg. Technol. 2, 200–211 (2020)
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: ICDM, pp. 289–296. IEEE (2001)
Leichtle, T., Geith, C., Lakes, T., Taubenböck, H.: Class imbalance in unsupervised change detection: a diagnostic analysis from urban remote sensing. Int. J. Appl. Earth Obs. Geoinf. 60, 83–98 (2017)
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)
Page, E.S.: Continuous inspection schemes. Biometrika 41(1–2), 100–115 (1954)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Raghavan, V., Galstyan, A., Tartakovsky, A.G.: Hidden Markov models for the activity profile of terrorist groups. Ann. Appl. Stat. 7(4), 2402–2430 (2013)
Rigaill, G.: A pruned dynamic programming algorithm to recover the best segmentations with 1 to \(K\_max\) change-points. J. de la Société Française de Stat. 156(4), 180–205 (2015)
Siegmund, D., Venkatraman, E.: Using the generalized likelihood ratio statistic for sequential detection of a change-point. Ann. Stat. 23(1), 255–271 (1995)
Truong, C., Oudre, L., Vayatis, N.: Selective review of offline change point detection methods. Sig. Process. 167, 107299 (2020)
Van Nieuwenburg, E.P., Liu, Y.H., Huber, S.D.: Learning phase transitions by confusion. Nat. Phys. 13(5), 435–439 (2017)
Xuan, X., Murphy, K.: Modeling changing dependency structure in multivariate time series. In: IMLS, pp. 1055–1062 (2007)
Acknowledgments
This work was funded in part by DARPA (W911NF-17-C-0094 and HR00111990114) and AFOSR (FA9550-20-1-0224).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
He, Y., Rao, A., Burghardt, K., Lerman, K. (2021). Identifying Shifts in Collective Attention to Topics on Social Media. In: Thomson, R., Hussain, M.N., Dancy, C., Pyke, A. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2021. Lecture Notes in Computer Science(), vol 12720. Springer, Cham. https://doi.org/10.1007/978-3-030-80387-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-80387-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80386-5
Online ISBN: 978-3-030-80387-2
eBook Packages: Computer ScienceComputer Science (R0)