Abstract
Predicting repost behaviors within social media networks plays an important role in human activities analysis and influence maximization decision making. Traditional methods for repost prediction can be categorized into stochastic diffusion based models and user profile or content features based machine learning models. In this paper, we propose a new framework combining user profile, content similarity and the neighborhood information around each target link as input features to make the prediction. Here neighborhood information can be interpreted as the combination of neighbors’ user profile. Two different kinds of graph based combination models are introduced in the article. After collecting the input features, we implement the state-of-the-art machine learning methods, e.g., Logistic Regression, K-nearest Neighbors, Gaussian Naive Bayes, Deep Neural Network, Random Forest, XGBoosting and Stacking Model to predict repost probability. We evaluate our model on real dataset Weibo to compare the performance with different features and machine learning methods.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Anshelevich, E., Chakrabarty, D., Hate, A., Swamy, C.: Approximation algorithms for the firefighter problem: cuts over time and submodularity. In: Dong, Y., Du, D.-Z., Ibarra, O. (eds.) ISAAC 2009. LNCS, vol. 5878, pp. 974–983. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10631-6_98
Bourigault, S., Lamprier, S., Gallinari, P.: Representation learning for information diffusion through social networks: an embedded cascade model. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM 2016, pp. 573–582. ACM, New York (2016). https://doi.org/10.1145/2835776.2835817, http://doi.acm.org/10.1145/2835776.2835817
Budak, C., Agrawal, D., El Abbadi, A.: Limiting the spread of misinformation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, pp. 665–674. ACM (2011)
Chen, G.H., Nikolov, S., Shah, D.: A latent source model for nonparametric time series classification. In: Advances in Neural Information Processing Systems, pp. 1088–1096 (2013)
Chen, M., Zheng, Q.P., Boginski, V., Pasiliao, E.L.: Reinforcement learning in information cascades based on dynamic user behavior. In: Tagarelli, A., Tong, H. (eds.) CSoNet 2019. LNCS, vol. 11917, pp. 148–154. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34980-6_17
Domingos, P.: Mining social networks for viral marketing. IEEE Intell. Syst. 20(1), 80–82 (2005)
Fei, H., Jiang, R., Yang, Y., Luo, B., Huan, J.: Content based social behavior prediction: a multi-task learning approach. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 995–1000. ACM (2011)
Goyal, A., Bonchi, F., Lakshmanan, L.V.: Learning influence probabilities in social networks. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 241–250. ACM (2010)
Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978)
Guille, A., Hacid, H.: A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st International Conference on World Wide Web, pp. 1145–1152. ACM (2012)
Jiang, B., et al.: Retweeting behavior prediction based on one-class collaborative filtering in social networks. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 977–980. ACM (2016)
Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, pp. 137–146. ACM, New York (2003). https://doi.org/10.1145/956750.956769, http://doi.acm.org/10.1145/956750.956769
Lagnier, C., Denoyer, L., Gaussier, E., Gallinari, P.: Predicting information diffusion in social networks using content and user’s profiles. In: Serdyukov, P., et al. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 74–85. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36973-5_7
Peng, H.K., Zhu, J., Piao, D., Yan, R., Zhang, Y.: Retweet modeling using conditional random fields. In: 2011 11th IEEE International Conference on Data Mining Workshops, pp. 336–343. IEEE (2011)
Qiang, Z., Pasiliao, E.L., Zheng, Q.P.: Model-based learning of information diffusion in social media networks. Appl. Netw. Sci. 4(1), 1–16 (2019). https://doi.org/10.1007/s41109-019-0215-3
Reimers, N., Gurevych, I.: Making monolingual sentence embeddings multilingual using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2020). https://arxiv.org/abs/2004.09813
Rodriguez, M.G., Balduzzi, D., Schölkopf, B.: Uncovering the temporal dynamics of diffusion networks. arXiv preprint arXiv:1105.0697 (2011)
Saito, K., Kimura, M., Ohara, K., Motoda, H.: Learning continuous-time information diffusion model for social behavioral data analysis. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS (LNAI), vol. 5828, pp. 322–337. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05224-8_25
Saito, K., Nakano, R., Kimura, M.: Prediction of information diffusion probabilities for independent cascade model. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5179, pp. 67–75. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85567-5_9
Saito, K., Ohara, K., Yamagishi, Y., Kimura, M., Motoda, H.: Learning diffusion probability based on node attributes in social networks. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS (LNAI), vol. 6804, pp. 153–162. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21916-0_18
Shah, D., Zaman, T.: Detecting sources of computer viruses in networks: theory and experiment. SIGMETRICS Perform. Eval. Rev. 38(1), 203–214 (2010). https://doi.org/10.1145/1811099.1811063, http://doi.acm.org/10.1145/1811099.1811063
Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? Large scale analytics on factors impacting retweet in twitter network. In: 2010 IEEE Second International Conference on Social Computing, pp. 177–184. IEEE (2010)
Tsur, O., Rappoport, A.: What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the fifth ACM International Conference on Web Search and Data Mining, pp. 643–652. ACM (2012)
Varshney, D., Kumar, S., Gupta, V.: Predicting information diffusion probabilities in social networks: a Bayesian networks based approach. Knowl.-Based Syst. 133, 66–76 (2017)
Yun, G., Zheng, Q.P., Boginski, V., Pasiliao, E.L.: Information network cascading and network re-construction with bounded rational user behaviors. In: Tagarelli, A., Tong, H. (eds.) CSoNet 2019. LNCS, vol. 11917, pp. 351–362. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34980-6_37
Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C.: Who influenced you? Predicting retweet via social influence locality. ACM Trans. Knowl. Discov. Data 9(3), 25:1–25:26 (2015). https://doi.org/10.1145/2700398, http://doi.acm.org/10.1145/2700398
Zhang, M., Chen, Y.: Link prediction based on graph neural networks. arXiv preprint arXiv:1802.09691 (2018)
Zhu, J., Xiong, F., Piao, D., Liu, Y., Zhang, Y.: Statistically modeling the effectiveness of disaster information in social media. In: 2011 IEEE Global Humanitarian Technology Conference (GHTC), pp. 431–436. IEEE (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
A Prediction Performance Measures
Measure | Definition | Formula |
---|---|---|
Accuracy | The ratio of correctly predicted observations to the total observations | \(\frac{\left( TP+TN\right) }{\left( TP+FP+FN+TN\right) }\) |
Precision | The ratio of correctly predicted positive observations to the total predicted positive observations | \(\frac{TP}{\left( TP+FP\right) }\) |
Recall | The ratio of correctly predicted positive observations to all the observations in actual class | \(\frac{TP}{\left( TP+FN\right) }\) |
F1 Score | The weighted average of Precision and Recall | \(\frac{2 \cdot Precision\cdot Recall}{\left( Precision+Recall\right) }\) |
ROCAUC | Compute area under the receiver operating characteristic curve which is True Positive Rate against False Positive Rate curve from prediction scores | – |
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Qiang, Z., Pasiliao, E.L., Semenov, A., Zheng, Q.P. (2023). Incorporating Neighborhood Information and Sentence Embedding Similarity into a Repost Prediction Model in Social Media Networks. In: Dinh, T.N., Li, M. (eds) Computational Data and Social Networks . CSoNet 2022. Lecture Notes in Computer Science, vol 13831. Springer, Cham. https://doi.org/10.1007/978-3-031-26303-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-26303-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26302-6
Online ISBN: 978-3-031-26303-3
eBook Packages: Computer ScienceComputer Science (R0)