Abstract
Social networks and discussion boards have become significant outlets for people to communicate and freely express their opinions. Although the social networks themselves are usually well-provisioned, the participating users frequently point to external links in order to substantiate their discussions. Unfortunately, the heavy traffic load suddenly imposed on these externally linked websites makes them unresponsive, leading to the “flash crowd effect.” Flash crowds present a real challenge as their intensity and occurrence times are impossible to predict. Moreover, most present-day web hosting servers and caching systems, although increasingly capable, are designed to handle a nominal load of requests before they become unresponsive due to limited bandwidth or the processing power allocated to the hosting site. In this paper, we quantify the prevalence of flash crowd events for a popular social discussion board (Digg). Using PlanetLab, we measured the response times of 1,289 unique popular websites and verified that 89 % of the popular URLs suffered variations in their response times. In an effort to identify flash crowds in advance, we evaluated and compared traffic forecasting mechanisms. We showed that predicting network traffic using network measurements has very limited success and cannot be used for large-scale prediction. However, by analyzing the content and structure of social discussions, we were able to accurately forecast popularity for 86 % of the websites within 5 min of a story’s submission and for 95 % of the sites when more social content (5 h worth) became available. Our work indicates that we can effectively leverage social activity to forecast network events when it would otherwise be infeasible to anticipate them.
Similar content being viewed by others
Notes
Digg: http://digg.com/.
Reddit: http://www.reddit.com/.
Delicious: http://www.delicious.com/.
Digg: http://digg.com/.
YouTube: http://www.youtube.com/.
References
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Ali-Hasan N, Adamic LA (2007) Expressing social relationships on the blog through links and comments. In: International Conference on Weblogs and Social Media (ICWSM)
Barford P, Kline J, Plonka D, Ron A (2002) A signal analysis of network traffic anomalies. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurment. ACM, pp 71–82
Baryshnikov Y, Coffman E, Pierre G, Rubenstein D, Squillante M, Yimwadsana T (2005) Predictability of web-server traffic congestion. In: Proceedings of the 10th international workshop on web content caching and distribution, IEEE Computer Society, Washington, DC, USA, pages 97–103
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30:1145–1159
Canali C, Colajanni M, Lancellotti R (2010) Characteristics and evolution of content popularity and user relations in social networks. In: 2010 IEEE Symposium on Computers and Communications (ISCC), pp 750–756
Cha M, Prez J, Haddadi H (2011) The spread of media content through blogs. Soc Netw Anal Min. 1–16. doi:10.1007/s13278-011-0040-x
Chabaa S, Zeroual A, Antari J (2010) Identification and prediction of internet traffic using artificial neural networks. JILSA 2(3):147–155
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chang C-C, Lin C-J (2002) Training v-support vector regression: theory and algorithms. Neural Comput 14(8):1959–1977
Figueiredo F, Benevenuto F, and Almeida JM (2011) The tube over time: characterizing popularity growth of youtube videos. In: Proceedings of the fourth ACM international conference on Web search and data mining, WSDM ’11. ACM, New York, NY, USA, pp 745–754
Frank E, Wang Y, Inglis S, Holmes G, Witten IH (1998) Using model trees for classification. Mach Learn 32:63–76
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Fu-Ke S, Wei Z, Pan C (2009) An engineering approach to prediction of network traffic based on time-series model. In: International Joint Conference on Artificial Intelligence, 2009. JCAI’09, IEEE, pp 432–435
Halavais AMC (2001) The slashdot effect: analysis of a large-scale public conversation on the world wide web. University of Washington
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1)
Jamali S, Rangwala H (2009) Digging digg: comment mining, popularity prediction, and social network analysis. In: WISM’09-AICI’09, Shanghai University of Electic Power, Shanghai, China. EI Compendex and ISTP
Jung J, Krishnamurthy B, Rabinovich M (2002) Flash crowds and denial of service attacks: characterization and implications for cdns and web sites. In: Proceedings of the 11th international conference on World Wide Web, WWW ’02, ACM, New York, NY, USA, pages 293–304
Lakhina A, Crovella M, Diot C (2004) Characterization of network-wide anomalies in traffic flows. In: Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, ACM, pp 201–206
Lerman K (2007) Social information processing in news aggregation. IEEE Internet Comput 11(6):16–28
Li K, Zhou W, Li P, Hai J, Liu J (2009) Distinguishing ddos attacks from flash crowds using probability metrics. In: Third international conference on network and system security, 2009. NSS ’09, pp 9–17
Li X, Bian F, Crovella M, Diot C, Govindan R, Iannaccone G, Lakhina A (2006) Detection and identification of network anomalies using sketch subspaces. In: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, ACM, pp 147–152
Liang C, Hiremagalore S, Stavrou A, Rangwala H (2011) Predicting network response times using social information. In: ASONAM, pp 527–531
Mishne G, Glance N (2006) Leave a reply: an analysis of weblog comments. In: In third annual workshop on the Weblogging ecosystem
Niksic H (1996) GNU wget
Papagiannaki K, Taft N, Zhang Z.L, Diot C (2005) Long-term forecasting of Internet backbone traffic. IEEE Trans Neural Netw 16(5):1110–1124
Rangwala H, Jamali S (2010) Defining a coparticipation network using comments on digg. Intell Syst IEEE 25(4):36–45
Sengar H, Wang X, Wang H, Wijesekera D, Jajodia S (2009) Online detection of network traffic anomalies using behavioral distance. In: 17th International Workshop on quality of service, 2009. IWQoS, IEEE, pp 1–9
Shakkottai S, Johari R (2010) Demand-aware content distribution on the internet. IEEE/ACM Transact Netw 18(2):476–489
Sivasubramanian S, Szymaniak M, Pierre G, Steen M (2004) Replication for web hosting systems. ACM Comput Surv (CSUR) 36(3):291–334
Szabo G, Huberman B (2008) Predicting the popularity of online content. Technical Report HP Labs, pp 1–6
Tang L, Liu H (2009) Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 817–826
Tang L, Liu H (2010) Toward collective behavior prediction via social dimension extraction. IEEE Intell Syst
Webb G (1997) Decision tree grafting. In: In IJCAI-97: fifteen international joint conference on artificial intelligence, Morgan Kaufmann, pp 846–851
Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40:159–196
Wendell P, Freedman MJ (2011) Going viral: flash crowds in an open cdn. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ’11, ACM, New York, NY, USA, pp 549–558
Zhongbao K, Changshui Z (2003) Reply networks on a bulletin board system. Phys Rev E 67(3):036117
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hiremagalore, S., Liang, C., Stavrou, A. et al. Improving network response times using social information. Soc. Netw. Anal. Min. 3, 209–220 (2013). https://doi.org/10.1007/s13278-012-0065-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-012-0065-9