Abstract
Web Farms are clustered systems designed to provide high availability and high performance web services. A web farm is a group of replicated HTTP servers that reply web requests forwarded by a single point of access to the service. To deal with this task the point of access executes a load balancing algorithm to distribute web request among the group of servers. The present algorithms provides a short-term dynamic configuration for this operation, but some corrective actions (granting different session priorities or distributed WAN forwarding) cannot be achieved without a long-term estimation of the future web load. On this paper we propose a method to forecast web service work load. Our approach also includes an innovative segmentation method for the web pages using EDAs (estimation of distribution algorithms) and the application of semi-naïve Bayes classifiers to predict future web load several minutes before. All our analysis has been performed using real data from a world-wide academic portal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Weka 3: Data mining with open source machine learning software in java (2003), http://www.cs.waikato.ac.nz/ml/weka/
Andresen, D., Yang, T., Ibarra, O.H.: Towards a scalable distributed WWW server on workstation clusters. In: Proc. of 10th IEEE Intl. Symp. Of Parallel Processing (IPPS 1996), pp. 850–856 (1996)
Zhang, W., Jin, S., Wu, Q.: Creating Linux virtual servers. In: LinuxExpo 1999 Conference (1999)
Baños, R., Gil, C., Ortega, J., Montoya, F.G.: Multilevel heuristic algorithm for graph partitioning. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 143–153. Springer, Heidelberg (2003)
Baños, R., Gil, C., Ortega, J., Montoya, F.G.: Partición de grafos mediante optimización evolutiva paralela. In: Proceedings de las XIV Jornadas de Paralelismo, pp. 245–250 (2003)
Brisco, T.: RFC 1794: DNS support for load balancing, April 1995. Status: INFORMATIONAL (1995)
Bui, T.N., Jones, C.: Finding good approximate vertex and edge partitions is np-hard. Information Processing Letters 42, 153–159 (1992)
Bui, T.N., Moon, B.: Genetic algorithms and graph partitioning. IEEE Transactions on Computers 45(7), 841–855 (1996)
Conti, M., Gregori, E., Panzieri, F.: Load distribution among replicated Web servers: A QoS-based approach. In: Proceedings of the Workshop on Internet Server Performance, WISP 1999 (1999)
Domingos, P., Pazzani, M.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning, pp. 105–112 (1996)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. Proceedings of the 12th International Conference on Machine Learning, 194–202 (1995)
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Fiduccia, C., Mattheyses, R.: A linear time heuristic for improving network partitions. In: Proceedings of the 19th IEEE Design Automation Conference, pp. 175–181 (1982)
Ghini, V., Panzieri, F., Roccetti, M.: Client-centered load distribution: A mechanism for constructing responsive web services. In: HICSS (2001)
Hand, D.J., Yu, K.: Idiot’s Bayes - not so stupid after all? International Statistical. Review 69(3), 385–398 (2001)
Hochsztain, E., Millán, S., Menasalvas, E.: A granular approach for analyzing the degree of affability of a web site. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 479–486. Springer, Heidelberg (2002)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–90 (1993)
Kohavi, R.: Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)
Kwan, T.T., McGrath, R.E., Reed, D.A.: NCSA’s World Wide Web server: Design and performance. IEEE Computer, 68–74 ( November 1995)
Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publisher, Dordrecht (2002)
Martin, B.: Instance-based learning: Nearest neigbour with generalisation. working paper series 95/18 computer science. Technical report, Hamilton, University of Waikato
Pazzani, M.: Constructive induction of Cartesian product attributes. Information, Statistics and Induction in Science, 66–77 (1996)
Quinlan, R.: C4.5 Programs for Machine Learning. Morgan Kauffman, San Francisco (1993)
Robles, V., Larrañaga, P., Peña, J.M., Menasalvas, E., Pérez, M.S., Herves, V.: Learning semi naïve Bayes structures by estimation of distribution algorithms. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 244–258. Springer, Heidelberg (2003)
Engelschall, R.S.: Load balancing your web site: Practical approaches for distributing HTTP traffic. Web Techniques Magazine 3(5) (1998)
Simon, H.D., Teng, S.: How good is recursive bisection? SIAM Journal of Scientific Computing 18(5), 1436–1445 (1997)
Srisuresh, P., Gan, D.: RFC 2391: Load sharing using IP network address translation (LSNAT) (August 1998); Status: INFORMATIONAL
Ting, K.M.: Discretization of continuous-valued attributes and instance-based learning. Technical Report 491, University of Sydney (1994)
Walshaw, C., Cross, M.: Mesh partitioning: a multilevel balancing and refinement algorithm. SIAM Journal of Science Computation 22(1), 63–80 (2000)
Zhang, W.: Linux virtual server for scalable network services. In: Ottawa Linux Symposium (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peña, J.M., Robles, V., Marbán, Ó., Pérez, M.S. (2004). Bayesian Methods to Estimate Future Load in Web Farms. In: Favela, J., Menasalvas, E., Chávez, E. (eds) Advances in Web Intelligence. AWIC 2004. Lecture Notes in Computer Science(), vol 3034. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24681-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-24681-7_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22009-1
Online ISBN: 978-3-540-24681-7
eBook Packages: Springer Book Archive