Skip to main content
Log in

Improving network response times using social information

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Social networks and discussion boards have become significant outlets for people to communicate and freely express their opinions. Although the social networks themselves are usually well-provisioned, the participating users frequently point to external links in order to substantiate their discussions. Unfortunately, the heavy traffic load suddenly imposed on these externally linked websites makes them unresponsive, leading to the “flash crowd effect.” Flash crowds present a real challenge as their intensity and occurrence times are impossible to predict. Moreover, most present-day web hosting servers and caching systems, although increasingly capable, are designed to handle a nominal load of requests before they become unresponsive due to limited bandwidth or the processing power allocated to the hosting site. In this paper, we quantify the prevalence of flash crowd events for a popular social discussion board (Digg). Using PlanetLab, we measured the response times of 1,289 unique popular websites and verified that 89 % of the popular URLs suffered variations in their response times. In an effort to identify flash crowds in advance, we evaluated and compared traffic forecasting mechanisms. We showed that predicting network traffic using network measurements has very limited success and cannot be used for large-scale prediction. However, by analyzing the content and structure of social discussions, we were able to accurately forecast popularity for 86 % of the websites within 5 min of a story’s submission and for 95 % of the sites when more social content (5 h worth) became available. Our work indicates that we can effectively leverage social activity to forecast network events when it would otherwise be infeasible to anticipate them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Digg: http://digg.com/.

  2. Reddit: http://www.reddit.com/.

  3. Delicious: http://www.delicious.com/.

  4. Digg: http://digg.com/.

  5. YouTube: http://www.youtube.com/.

References

  • Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  • Ali-Hasan N, Adamic LA (2007) Expressing social relationships on the blog through links and comments. In: International Conference on Weblogs and Social Media (ICWSM)

  • Barford P, Kline J, Plonka D, Ron A (2002) A signal analysis of network traffic anomalies. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment. ACM, pp 71–82

  • Baryshnikov Y, Coffman E, Pierre G, Rubenstein D, Squillante M, Yimwadsana T (2005) Predictability of web-server traffic congestion. In: Proceedings of the 10th international workshop on web content caching and distribution, IEEE Computer Society, Washington, DC, USA, pages 97–103

  • Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159

    Article  Google Scholar 

  • Canali C, Colajanni M, Lancellotti R (2010) Characteristics and evolution of content popularity and user relations in social networks. In: 2010 IEEE Symposium on Computers and Communications (ISCC), pp 750–756

  • Cha M, Prez J, Haddadi H (2011) The spread of media content through blogs. Soc Netw Anal Min. 1–16. doi:10.1007/s13278-011-0040-x

  • Chabaa S, Zeroual A, Antari J (2010) Identification and prediction of internet traffic using artificial neural networks. JILSA 2(3):147–155

    Article  Google Scholar 

  • Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Chang C-C, Lin C-J (2002) Training v-support vector regression: theory and algorithms. Neural Comput 14(8):1959–1977

    Article  MATH  Google Scholar 

  • Figueiredo F, Benevenuto F, and Almeida JM (2011) The tube over time: characterizing popularity growth of youtube videos. In: Proceedings of the fourth ACM international conference on Web search and data mining, WSDM ’11. ACM, New York, NY, USA, pp 745–754

  • Frank E, Wang Y, Inglis S, Holmes G, Witten IH (1998) Using model trees for classification. Mach Learn 32:63–76

    Article  MATH  Google Scholar 

  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  MathSciNet  MATH  Google Scholar 

  • Fu-Ke S, Wei Z, Pan C (2009) An engineering approach to prediction of network traffic based on time-series model. In: International Joint Conference on Artificial Intelligence, 2009. JCAI’09, IEEE, pp 432–435

  • Halavais AMC (2001) The slashdot effect: analysis of a large-scale public conversation on the world wide web. University of Washington

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1)

  • Jamali S, Rangwala H (2009) Digging digg: comment mining, popularity prediction, and social network analysis. In: WISM’09-AICI’09, Shanghai University of Electic Power, Shanghai, China. EI Compendex and ISTP

  • Jung J, Krishnamurthy B, Rabinovich M (2002) Flash crowds and denial of service attacks: characterization and implications for cdns and web sites. In: Proceedings of the 11th international conference on World Wide Web, WWW ’02, ACM, New York, NY, USA, pages 293–304

  • Lakhina A, Crovella M, Diot C (2004) Characterization of network-wide anomalies in traffic flows. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp 201–206

  • Lerman K (2007) Social information processing in news aggregation. IEEE Internet Comput 11(6):16–28

    Article  MathSciNet  Google Scholar 

  • Li K, Zhou W, Li P, Hai J, Liu J (2009) Distinguishing ddos attacks from flash crowds using probability metrics. In: Third international conference on network and system security, 2009. NSS ’09, pp 9–17

  • Li X, Bian F, Crovella M, Diot C, Govindan R, Iannaccone G, Lakhina A (2006) Detection and identification of network anomalies using sketch subspaces. In: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, ACM, pp 147–152

  • Liang C, Hiremagalore S, Stavrou A, Rangwala H (2011) Predicting network response times using social information. In: ASONAM, pp 527–531

  • Mishne G, Glance N (2006) Leave a reply: an analysis of weblog comments. In: In third annual workshop on the Weblogging ecosystem

  • Niksic H (1996) GNU wget

  • Papagiannaki K, Taft N, Zhang Z.L, Diot C (2005) Long-term forecasting of Internet backbone traffic. IEEE Trans Neural Netw 16(5):1110–1124

    Article  Google Scholar 

  • Rangwala H, Jamali S (2010) Defining a coparticipation network using comments on digg. Intell Syst IEEE 25(4):36–45

    Article  Google Scholar 

  • Sengar H, Wang X, Wang H, Wijesekera D, Jajodia S (2009) Online detection of network traffic anomalies using behavioral distance. In: 17th International Workshop on quality of service, 2009. IWQoS, IEEE, pp 1–9

  • Shakkottai S, Johari R (2010) Demand-aware content distribution on the internet. IEEE/ACM Transact Netw 18(2):476–489

    Article  Google Scholar 

  • Sivasubramanian S, Szymaniak M, Pierre G, Steen M (2004) Replication for web hosting systems. ACM Comput Surv (CSUR) 36(3):291–334

    Article  Google Scholar 

  • Szabo G, Huberman B (2008) Predicting the popularity of online content. Technical Report HP Labs, pp 1–6

  • Tang L, Liu H (2009) Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 817–826

  • Tang L, Liu H (2010) Toward collective behavior prediction via social dimension extraction. IEEE Intell Syst

  • Webb G (1997) Decision tree grafting. In: In IJCAI-97: fifteen international joint conference on artificial intelligence, Morgan Kaufmann, pp 846–851

  • Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40:159–196

    Article  Google Scholar 

  • Wendell P, Freedman MJ (2011) Going viral: flash crowds in an open cdn. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ’11, ACM, New York, NY, USA, pp 549–558

  • Zhongbao K, Changshui Z (2003) Reply networks on a bulletin board system. Phys Rev E 67(3):036117

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huzefa Rangwala.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hiremagalore, S., Liang, C., Stavrou, A. et al. Improving network response times using social information. Soc. Netw. Anal. Min. 3, 209–220 (2013). https://doi.org/10.1007/s13278-012-0065-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-012-0065-9

Keywords

Navigation