Skip to main content
Log in

Affinity-driven blog cascade analysis and prediction

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Information propagation within the blogosphere is of much importance in implementing policies, marketing research, launching new products, and other applications. In this paper, we take a microscopic view of the information propagation pattern in blogosphere by investigating blog cascade affinity. A blog cascade is a group of posts linked together discussing about the same topic, and cascade affinity refers to the phenomenon of a blog’s inclination to join a specific cascade. We identify and analyze an array of macroscopic and microscopic content-oblivious features that may affect a blogger’s cascade joining behavior and utilize these features to predict cascade affinity of blogs. Based on these features, we present two non-probabilistic and probabilistic strategies, namely support vector machine (SVM) classification-based approach and Bipartite Markov Random Field-based (BiMRF) approach, respectively, to predict the probability of blogs’ affinity to a cascade and rank them accordingly. Evaluated on a real dataset consisting of 873,496 posts, our experimental results demonstrate that our prediction strategy can generate high quality results (\(F1\)-measure of 72.5 % for SVM and 71.1 % for BiMRF) comparing with the approaches using traditional or singular features only such as elapsed time, number of participants which is around 11.2 and 8.9 %, respectively. Our experiments also showed that among all features identified, the number of quasi-friends is the most important factor affecting bloggers’ inclination to join cascades.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://technorati.com

  2. A shorter version of this work has been published in (Li et al. 2009).

  3. http://technorati.com/developers/api

  4. We adopted the method described in paper Chekuri et al. (2006) for fitting power-law distributions

    Table 2 Statistics of the data set

References

  • Adams B, Phung DQ, Venkatesh S (2010) Discovery of latent subcommunities in a blog’s readership. TWEB 4(3):12:1–12:30

    Google Scholar 

  • Agarwal N, Liu H, Tang L, Yu PS (2008) Identifying the influential bloggers in a community. In: WSDM ’08: Proceedings of the 1st ACM international conference on web search and data mining, pp 207–218

  • Backstrom L, Huttenlocher DP, Kleinberg JM, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 44–54

  • Bao H, Chang EY (2010) Adheat: an influence-based diffusion model for propagating hints to match ads. In: WWW ’10: Proceedings of the 19th international conference on, World wide web, pp 71–80

  • Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  Google Scholar 

  • Bikhchandani S, Hirshleifer D, Welch I (1992) A theory of fads, fashion, custom, and cultural change as informational cascades. J Political Econ 100(5):992–1026

    Article  Google Scholar 

  • Cha M, Mislove A, Gummadi PK (2009) A measurement-driven analysis of information propagation in the flickr social network. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 721–730

  • Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 10 Feb 2013

  • Chekuri C, Even G, Kortasrz G (2006) A greedy approximation algorithm for the group steiner problem. Discret Appl Math 154(1):15–34

    Article  MATH  Google Scholar 

  • Chen H, Tiño P, Yao X (2009b) Predictive ensemble pruning by expectation propagation. IEEE Trans Knowl Data Eng 21(7):999–1013

    Article  Google Scholar 

  • Chen D, Tang J, Li J, Zhou L (2009a) Discovering the staring people from social networks. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 1219–1220

  • Clements M, De Vries AP, Reinders MJT (2010) The task-dependent effect of tags and ratings on social media access. ACM Trans Inf Syst 28:21:1–21:42

    Article  Google Scholar 

  • Davidson I, Gilpin S, Walker PB (2012) Behavioral event data and their analysis. Data Min Knowl Discov 25(3):635–653

    Google Scholar 

  • Dodds PS, Watts DJ (2004) Universal behavior in a generalized model of contagion. Phys Rev Lett 92(21):218, 701+

    Google Scholar 

  • Goyal A, Bonchi F, Lakshmanan Laks VS (2012) A data-based approach to social influence maximization. PVLDB 5(1):73–84

    Google Scholar 

  • Gruhl D, Guha RV, Liben-Nowell D, Tomkins A (2004) Information diffusion through blogspace. In: WWW ’04: Proceedings of the 13th international conference on, World wide web, pp 491–501

  • Guice SL (1995) Creating Communities of Readers: A Study of Children’s Information Networks as Multiple Contexts for Responding to Texts. Journal of Literacy Research 27(3):379–397

    Article  Google Scholar 

  • Hartline JD, Mirrokni VS, Sundararajan M (2008) Optimal marketing strategies over social networks. In: WWW ’08: Proceedings of the 17th international conference on, World wide web, pp 189–198

  • Iribarren JL, Moro E (2009) Impact of human activity patterns on the dynamics of information diffusion. Phys Rev Lett 103(3):038, 702+

    Google Scholar 

  • Karagiannis T, Vojnovic M (2009) Behavioral profiles for advanced email features. In: WWW ’09: Proceedings of the 18th international conference on, World wide web, pp 711–720

  • Kempe D, Kleinberg JM, Tardos É (2003) Maximizing the spread of influence through a social network. In: KDD ’03: Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 137–146

  • Kimura M, Saito K, Motoda H (2009) Blocking links to minimize contamination spread in a social network. ACM Trans Knowl Discov Data 3:9:1–9:23

    Article  Google Scholar 

  • Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW ’03: Proceedings of the 12th international conference on, World wide web, pp 568–576

  • Lee C, Kwak H, Park H, Moon SB (2010) Finding influentials based on the temporal order of information adoption in twitter. In: WWW ’10: Proceedings of the 19th international conference on, World wide web, pp 1137–1138

  • Lerman K, Hogg T (2010) Using a model of social dynamics to predict popularity of news. In: WWW ’10: Proceedings of the 19th international conference on World wide web, ACM, New York, NY, USA, WWW ’10, pp 621–630

  • Leskovec J, Adamic LA, Huberman BA (2006) The dynamics of viral marketing. In: EC ’06: Proceedings of the 7th ACM conference on Electronic commerce, ACM, New York, NY, USA, pp 228–237

  • Leskovec J, Adamic LA, Huberman BA (2007a) The dynamics of viral marketing. TWEB 1(1): Article 5. doi:10.1145/1232722.1232727

  • Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: KDD ’08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 462–470

  • Leskovec J, McGlohon M, Faloutsos C, Glance N, Hurst M (2007b) Cascading behavior in large blog graphs: Patterns and a model. In: SDM ’07: Society of Applied and Industrial Mathematics: Data Mining

  • Li H, Bhowmick SS, Sun A (2009) Blog cascade affinity: analysis and prediction. In: CIKM’ 09: Proceeding of the 18th ACM conference on Information and knowledge management, ACM, New York, NY, USA, CIKM ’09, pp 1117–1126

  • Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(3):503–528

    Article  MATH  MathSciNet  Google Scholar 

  • Ma H, Yang H, Lyu MR, King I (2008) Mining social networks using heat diffusion processes for marketing candidates selection. In: CIKM ’08: Proceeding of the 17th ACM conference on Information and, knowledge management, pp 233–242

  • McGlohon M, Leskovec J, Faloutsos C, Hurst M, Glance N (2007) Finding patterns in blog shapes and blog evolution. In: International Conference on Weblogs and Social Media, Boulder, Colo

  • Newman MEJ (2002) Spread of epidemic disease on networks. Phys Rev E 66(1):016, 128+

    Google Scholar 

  • Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256

    Article  MATH  MathSciNet  Google Scholar 

  • Pal A, Counts S (2011) Identifying topical authorities in microblogs. In: WSDM ’11: Proceedings of the Forth International Conference on Web Search and Web Data Mining, ACM, New York, NY, USA, pp 45–54

  • Pastor-Satorras R, Vespignani A (2002) Epidemics and immunization in scale-free networks. ArXiv Condensed Matter e-prints/0205260

  • Rogers EM (2003) Diffusion of innovations, 5th edn. Free Press, New York

    Google Scholar 

  • Satorras RP, Vespignani A (2001) Epidemic spreading in scale-free networks. Phys Rev Lett 86(14): 3200–3203

    Google Scholar 

  • Shi X, Zhu J, Cai R, Zhang L (2009) User grouping behavior in online forums. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 777–786

  • Stewart A, Chen L, Paiu R, Nejdl W (2007) Discovering information diffusion paths from blogosphere for online advertising. In: ADKDD ’07: Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, ACM, New York, NY, USA, pp 46–54

  • Strang D, Soule S (1998) Diffusion in organizations and social movements: from hybrid corn to poison pills. Annu Rev Sociol 24:265–290

    Article  Google Scholar 

  • Technorati (2008) State of the blogosphere. Tech Rep http://www.technorati.com/blogging/state-of-the-blogosphere/. Accessed 3 Mar 2010

  • Wang Y, Chakrabarti D, Wang C, Faloutsos C (2003) Epidemic spreading in real networks: An eigenvalue viewpoint. IEEE Symposium on Reliable Distributed Systems 0:25+

  • Wang Y, Cong G, Song G, Xie K (2010) Community-based greedy algorithm for mining top-K influential nodes in mobile social networks. In: KDD ’10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 1039–1048

  • Watts D (2002) A simple model of global cascades on random networks. P Natl Acad Sci USA 99(9):5766–5771

    Article  MATH  MathSciNet  Google Scholar 

  • Watts DJ, Dodds PS (2007) Influentials, networks, and public opinion formation. J Consumer Res 34: 441–458

    Google Scholar 

Download references

Acknowledgments

Part of the work was done when the first author was pursuing PhD in School of Computer Engineering, Nanyang Technological University, Singapore. This work is partly supported by NSFC 61202179, 61173089

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Li.

Additional information

Responsible editor: Bing Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Bhowmick, S.S., Sun, A. et al. Affinity-driven blog cascade analysis and prediction. Data Min Knowl Disc 28, 442–474 (2014). https://doi.org/10.1007/s10618-013-0307-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0307-0

Keywords

Navigation