Skip to main content

Bayesian Methods to Estimate Future Load in Web Farms

  • Conference paper
Advances in Web Intelligence (AWIC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3034))

Included in the following conference series:

  • 300 Accesses

Abstract

Web Farms are clustered systems designed to provide high availability and high performance web services. A web farm is a group of replicated HTTP servers that reply web requests forwarded by a single point of access to the service. To deal with this task the point of access executes a load balancing algorithm to distribute web request among the group of servers. The present algorithms provides a short-term dynamic configuration for this operation, but some corrective actions (granting different session priorities or distributed WAN forwarding) cannot be achieved without a long-term estimation of the future web load. On this paper we propose a method to forecast web service work load. Our approach also includes an innovative segmentation method for the web pages using EDAs (estimation of distribution algorithms) and the application of semi-naïve Bayes classifiers to predict future web load several minutes before. All our analysis has been performed using real data from a world-wide academic portal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Weka 3: Data mining with open source machine learning software in java (2003), http://www.cs.waikato.ac.nz/ml/weka/

  2. Andresen, D., Yang, T., Ibarra, O.H.: Towards a scalable distributed WWW server on workstation clusters. In: Proc. of 10th IEEE Intl. Symp. Of Parallel Processing (IPPS 1996), pp. 850–856 (1996)

    Google Scholar 

  3. Zhang, W., Jin, S., Wu, Q.: Creating Linux virtual servers. In: LinuxExpo 1999 Conference (1999)

    Google Scholar 

  4. Baños, R., Gil, C., Ortega, J., Montoya, F.G.: Multilevel heuristic algorithm for graph partitioning. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 143–153. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Baños, R., Gil, C., Ortega, J., Montoya, F.G.: Partición de grafos mediante optimización evolutiva paralela. In: Proceedings de las XIV Jornadas de Paralelismo, pp. 245–250 (2003)

    Google Scholar 

  6. Brisco, T.: RFC 1794: DNS support for load balancing, April 1995. Status: INFORMATIONAL (1995)

    Google Scholar 

  7. Bui, T.N., Jones, C.: Finding good approximate vertex and edge partitions is np-hard. Information Processing Letters 42, 153–159 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  8. Bui, T.N., Moon, B.: Genetic algorithms and graph partitioning. IEEE Transactions on Computers 45(7), 841–855 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  9. Conti, M., Gregori, E., Panzieri, F.: Load distribution among replicated Web servers: A QoS-based approach. In: Proceedings of the Workshop on Internet Server Performance, WISP 1999 (1999)

    Google Scholar 

  10. Domingos, P., Pazzani, M.: Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th International Conference on Machine Learning, pp. 105–112 (1996)

    Google Scholar 

  11. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. Proceedings of the 12th International Conference on Machine Learning, 194–202 (1995)

    Google Scholar 

  12. Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Conference on Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

  13. Fiduccia, C., Mattheyses, R.: A linear time heuristic for improving network partitions. In: Proceedings of the 19th IEEE Design Automation Conference, pp. 175–181 (1982)

    Google Scholar 

  14. Ghini, V., Panzieri, F., Roccetti, M.: Client-centered load distribution: A mechanism for constructing responsive web services. In: HICSS (2001)

    Google Scholar 

  15. Hand, D.J., Yu, K.: Idiot’s Bayes - not so stupid after all? International Statistical. Review 69(3), 385–398 (2001)

    MATH  Google Scholar 

  16. Hochsztain, E., Millán, S., Menasalvas, E.: A granular approach for analyzing the degree of affability of a web site. In: Alpigini, J.J., Peters, J.F., Skowron, A., Zhong, N. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 479–486. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–90 (1993)

    Article  MATH  Google Scholar 

  18. Kohavi, R.: Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)

    Google Scholar 

  19. Kwan, T.T., McGrath, R.E., Reed, D.A.: NCSA’s World Wide Web server: Design and performance. IEEE Computer, 68–74 ( November 1995)

    Google Scholar 

  20. Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publisher, Dordrecht (2002)

    MATH  Google Scholar 

  21. Martin, B.: Instance-based learning: Nearest neigbour with generalisation. working paper series 95/18 computer science. Technical report, Hamilton, University of Waikato

    Google Scholar 

  22. Pazzani, M.: Constructive induction of Cartesian product attributes. Information, Statistics and Induction in Science, 66–77 (1996)

    Google Scholar 

  23. Quinlan, R.: C4.5 Programs for Machine Learning. Morgan Kauffman, San Francisco (1993)

    Google Scholar 

  24. Robles, V., Larrañaga, P., Peña, J.M., Menasalvas, E., Pérez, M.S., Herves, V.: Learning semi naïve Bayes structures by estimation of distribution algorithms. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 244–258. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  25. Engelschall, R.S.: Load balancing your web site: Practical approaches for distributing HTTP traffic. Web Techniques Magazine 3(5) (1998)

    Google Scholar 

  26. Simon, H.D., Teng, S.: How good is recursive bisection? SIAM Journal of Scientific Computing 18(5), 1436–1445 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  27. Srisuresh, P., Gan, D.: RFC 2391: Load sharing using IP network address translation (LSNAT) (August 1998); Status: INFORMATIONAL

    Google Scholar 

  28. Ting, K.M.: Discretization of continuous-valued attributes and instance-based learning. Technical Report 491, University of Sydney (1994)

    Google Scholar 

  29. Walshaw, C., Cross, M.: Mesh partitioning: a multilevel balancing and refinement algorithm. SIAM Journal of Science Computation 22(1), 63–80 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  30. Zhang, W.: Linux virtual server for scalable network services. In: Ottawa Linux Symposium (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peña, J.M., Robles, V., Marbán, Ó., Pérez, M.S. (2004). Bayesian Methods to Estimate Future Load in Web Farms. In: Favela, J., Menasalvas, E., Chávez, E. (eds) Advances in Web Intelligence. AWIC 2004. Lecture Notes in Computer Science(), vol 3034. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24681-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24681-7_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22009-1

  • Online ISBN: 978-3-540-24681-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics