Abstract
Information propagation within the blogosphere is of much importance in implementing policies, marketing research, launching new products, and other applications. In this paper, we take a microscopic view of the information propagation pattern in blogosphere by investigating blog cascade affinity. A blog cascade is a group of posts linked together discussing about the same topic, and cascade affinity refers to the phenomenon of a blog’s inclination to join a specific cascade. We identify and analyze an array of macroscopic and microscopic content-oblivious features that may affect a blogger’s cascade joining behavior and utilize these features to predict cascade affinity of blogs. Based on these features, we present two non-probabilistic and probabilistic strategies, namely support vector machine (SVM) classification-based approach and Bipartite Markov Random Field-based (BiMRF) approach, respectively, to predict the probability of blogs’ affinity to a cascade and rank them accordingly. Evaluated on a real dataset consisting of 873,496 posts, our experimental results demonstrate that our prediction strategy can generate high quality results (\(F1\)-measure of 72.5 % for SVM and 71.1 % for BiMRF) comparing with the approaches using traditional or singular features only such as elapsed time, number of participants which is around 11.2 and 8.9 %, respectively. Our experiments also showed that among all features identified, the number of quasi-friends is the most important factor affecting bloggers’ inclination to join cascades.
Similar content being viewed by others
Notes
References
Adams B, Phung DQ, Venkatesh S (2010) Discovery of latent subcommunities in a blog’s readership. TWEB 4(3):12:1–12:30
Agarwal N, Liu H, Tang L, Yu PS (2008) Identifying the influential bloggers in a community. In: WSDM ’08: Proceedings of the 1st ACM international conference on web search and data mining, pp 207–218
Backstrom L, Huttenlocher DP, Kleinberg JM, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 44–54
Bao H, Chang EY (2010) Adheat: an influence-based diffusion model for propagating hints to match ads. In: WWW ’10: Proceedings of the 19th international conference on, World wide web, pp 71–80
Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Bikhchandani S, Hirshleifer D, Welch I (1992) A theory of fads, fashion, custom, and cultural change as informational cascades. J Political Econ 100(5):992–1026
Cha M, Mislove A, Gummadi PK (2009) A measurement-driven analysis of information propagation in the flickr social network. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 721–730
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 10 Feb 2013
Chekuri C, Even G, Kortasrz G (2006) A greedy approximation algorithm for the group steiner problem. Discret Appl Math 154(1):15–34
Chen H, Tiño P, Yao X (2009b) Predictive ensemble pruning by expectation propagation. IEEE Trans Knowl Data Eng 21(7):999–1013
Chen D, Tang J, Li J, Zhou L (2009a) Discovering the staring people from social networks. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 1219–1220
Clements M, De Vries AP, Reinders MJT (2010) The task-dependent effect of tags and ratings on social media access. ACM Trans Inf Syst 28:21:1–21:42
Davidson I, Gilpin S, Walker PB (2012) Behavioral event data and their analysis. Data Min Knowl Discov 25(3):635–653
Dodds PS, Watts DJ (2004) Universal behavior in a generalized model of contagion. Phys Rev Lett 92(21):218, 701+
Goyal A, Bonchi F, Lakshmanan Laks VS (2012) A data-based approach to social influence maximization. PVLDB 5(1):73–84
Gruhl D, Guha RV, Liben-Nowell D, Tomkins A (2004) Information diffusion through blogspace. In: WWW ’04: Proceedings of the 13th international conference on, World wide web, pp 491–501
Guice SL (1995) Creating Communities of Readers: A Study of Children’s Information Networks as Multiple Contexts for Responding to Texts. Journal of Literacy Research 27(3):379–397
Hartline JD, Mirrokni VS, Sundararajan M (2008) Optimal marketing strategies over social networks. In: WWW ’08: Proceedings of the 17th international conference on, World wide web, pp 189–198
Iribarren JL, Moro E (2009) Impact of human activity patterns on the dynamics of information diffusion. Phys Rev Lett 103(3):038, 702+
Karagiannis T, Vojnovic M (2009) Behavioral profiles for advanced email features. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 711–720
Kempe D, Kleinberg JM, Tardos É (2003) Maximizing the spread of influence through a social network. In: KDD ’03: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 137–146
Kimura M, Saito K, Motoda H (2009) Blocking links to minimize contamination spread in a social network. ACM Trans Knowl Discov Data 3:9:1–9:23
Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW ’03: Proceedings of the 12th international conference on, World wide web, pp 568–576
Lee C, Kwak H, Park H, Moon SB (2010) Finding influentials based on the temporal order of information adoption in twitter. In: WWW ’10: Proceedings of the 19th international conference on, World wide web, pp 1137–1138
Lerman K, Hogg T (2010) Using a model of social dynamics to predict popularity of news. In: WWW ’10: Proceedings of the 19th international conference on World wide web, ACM, New York, NY, USA, WWW ’10, pp 621–630
Leskovec J, Adamic LA, Huberman BA (2006) The dynamics of viral marketing. In: EC ’06: Proceedings of the 7th ACM conference on Electronic commerce, ACM, New York, NY, USA, pp 228–237
Leskovec J, Adamic LA, Huberman BA (2007a) The dynamics of viral marketing. TWEB 1(1): Article 5. doi:10.1145/1232722.1232727
Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: KDD ’08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 462–470
Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007b) Cascading behavior in large blog graphs: Patterns and a model. In: SDM ’07: Society of Applied and Industrial Mathematics: Data Mining
Li H, Bhowmick SS, Sun A (2009) Blog cascade affinity: analysis and prediction. In: CIKM’ 09: Proceeding of the 18th ACM conference on Information and knowledge management, ACM, New York, NY, USA, CIKM ’09, pp 1117–1126
Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(3):503–528
Ma H, Yang H, Lyu MR, King I (2008) Mining social networks using heat diffusion processes for marketing candidates selection. In: CIKM ’08: Proceeding of the 17th ACM conference on Information and, knowledge management, pp 233–242
McGlohon M, Leskovec J, Faloutsos C, Hurst M, Glance N (2007) Finding patterns in blog shapes and blog evolution. In: International Conference on Weblogs and Social Media, Boulder, Colo
Newman MEJ (2002) Spread of epidemic disease on networks. Phys Rev E 66(1):016, 128+
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256
Pal A, Counts S (2011) Identifying topical authorities in microblogs. In: WSDM ’11: Proceedings of the Forth International Conference on Web Search and Web Data Mining, ACM, New York, NY, USA, pp 45–54
Pastor-Satorras R, Vespignani A (2002) Epidemics and immunization in scale-free networks. ArXiv Condensed Matter e-prints/0205260
Rogers EM (2003) Diffusion of innovations, 5th edn. Free Press, New York
Satorras RP, Vespignani A (2001) Epidemic spreading in scale-free networks. Phys Rev Lett 86(14): 3200–3203
Shi X, Zhu J, Cai R, Zhang L (2009) User grouping behavior in online forums. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 777–786
Stewart A, Chen L, Paiu R, Nejdl W (2007) Discovering information diffusion paths from blogosphere for online advertising. In: ADKDD ’07: Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, ACM, New York, NY, USA, pp 46–54
Strang D, Soule S (1998) Diffusion in organizations and social movements: from hybrid corn to poison pills. Annu Rev Sociol 24:265–290
Technorati (2008) State of the blogosphere. Tech Rep http://www.technorati.com/blogging/state-of-the-blogosphere/. Accessed 3 Mar 2010
Wang Y, Chakrabarti D, Wang C, Faloutsos C (2003) Epidemic spreading in real networks: An eigenvalue viewpoint. IEEE Symposium on Reliable Distributed Systems 0:25+
Wang Y, Cong G, Song G, Xie K (2010) Community-based greedy algorithm for mining top-K influential nodes in mobile social networks. In: KDD ’10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 1039–1048
Watts D (2002) A simple model of global cascades on random networks. P Natl Acad Sci USA 99(9):5766–5771
Watts DJ, Dodds PS (2007) Influentials, networks, and public opinion formation. J Consumer Res 34: 441–458
Acknowledgments
Part of the work was done when the first author was pursuing PhD in School of Computer Engineering, Nanyang Technological University, Singapore. This work is partly supported by NSFC 61202179, 61173089
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Bing Liu.
Rights and permissions
About this article
Cite this article
Li, H., Bhowmick, S.S., Sun, A. et al. Affinity-driven blog cascade analysis and prediction. Data Min Knowl Disc 28, 442–474 (2014). https://doi.org/10.1007/s10618-013-0307-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-013-0307-0