Skip to main content

Acquisition and Modeling of Website Parameters

  • Conference paper
  • First Online:
Advanced Information Networking and Applications (AINA 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 227))

  • 1100 Accesses

Abstract

During the last 30 years, the web has evolved from simple information HTML pages to complex applications supporting business, television, newspapers, entertainment, and others. While there are many articles on website popularity, there has been little work in understanding the complexity of individual web pages. In the article, we present a measurement-driven study of the complexity of web pages today. We measured 426 866 web pages in about 12 weeks. Our study is devoted to two problems. The first problem was to describe the complexity of a web page with metrics based on the content they included and the kind of service they offered. The second focus of our study was to build probabilistic models of observed distributions. Such models can be used in HTTP request generators modelling the work of modern web systems. Separate models are proposed for each category of web pages and all pages together.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alexa – traffic analysis company. https://www.alexa.com/. Accessed 10 Oct 2020

  2. Arlitt, M.F., Friedrich, R., Jin, T.: Workload characterization of a Web proxy in a cable modem environment. ACM Performance Eval. Rev. 27(2), 25–36 (1999)

    Article  Google Scholar 

  3. Arvidsson, A., Grinnemo, K., Chen, E., Wang, Q., Brunstrom, A.: Web metrics for the next generation performance enhancing proxies. In: 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, pp. 1–6 (2019)

    Google Scholar 

  4. Barford P., Bestavros, A., Bradley, A., Crovella, M.: Changes in web client access patterns, characteristics and caching implications In: Special Issue on World Wide Web Characterization and Performance Evaluation; World Wide Web Journal, (1998)

    Google Scholar 

  5. Barford, P., Misra, V.: Measurement. IMA Workshop on Internet Modeling and Analysis, Minneapolis, MN, January, Modeling and Analysis of the Internet (2004)

    Google Scholar 

  6. Broder, A., et al.: Graph structure in the web. Comput. Networks 33(1), 309–320 (2000)

    Article  Google Scholar 

  7. Butkiewicz, M., Madhyastha, H.V., Sekar, V.: Understanding website complexity: measurements, metrics, and implications. In: Proceedings 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 313–328 (2011)

    Google Scholar 

  8. Cherkasova, L., Karlsson, M.: Dynamics and evolution of web sites: analysis, metrics and design issues. In: Proceedings of the 6th IEEE Symposium on Computers and Communications, Hammamet, Tunisia, 64–71 (2001)

    Google Scholar 

  9. Cook, S., Mathieu, B., Truong, P., Hamchaoui, I.: QUIC: better for what and for whom? In: Proceedings of IEEE International Conference on Communications (ICC) (2017)

    Google Scholar 

  10. Crovella, M.E., Bestavros, A.: SelfSimilarity in World Wide Web traffic evidence and possible causes. In: SIGMETRICS 1996, USA, Philadelphia (1996)

    Google Scholar 

  11. Everts, T.: The average web page is 3MB. How much should we care? https://speedcurve.com/blog/web-performance-page-bloat/. Accessed 10 Oct 2020

  12. Fetterly, D., Manasse, M., Najork, M., Wiener, J.: A large-scale study of the evolution of web pages. Softw. Practice and Experience 34(2), 213–237 (2004)

    Article  Google Scholar 

  13. Mendes, J., Laranjeiro, N., Vieira, M.: Toward characterizing HTML defects on the Web. Software Practice and Experience 48(1), 750–757 (2018)

    Article  Google Scholar 

  14. Hernandez-Campos, F., Jeffay, K., Donelson-Smith F.: Tracking the evolution of web traffic: 1995–2003. In: Proceedings of 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS), pp. 16–25 (2003)

    Google Scholar 

  15. Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: 2011 ACM SIGCOMM Internet Measurement Conference, pp. 295–312 (2011)

    Google Scholar 

  16. Johnson, T., Seeling, P.: Landing page characteristics model for mobile web performance evaluations on object and page levels. In: 2015 IEEE International Conference on Communications (ICC), pp. 3616–3621 (2015)

    Google Scholar 

  17. Kleinberg, J.M., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: The web as a graph: measurements, models and methods. In: Proceedings COCOON (1999)

    Google Scholar 

  18. Knox, A., Seeling, P.: Mobile web page characteristics: delivery and stability considerations. In: 2017 14th IEEE Annual Consumer Communications Networking Conference (CCNC), pp. 37–40 (2017)

    Google Scholar 

  19. Lychev, R., Jero, S., Boldyreva, A., Nita-Rotaru, C.: How Secure and Quick is QUIC? Provable Security and Performance Analyses. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), (2015)

    Google Scholar 

  20. Manzoor, J., Drago, I., Sadre, R.: How HTTP/2 is changing web traffic and how to detect it. In: Network Traffic Measurement and Analysis Conference, TMA 2017, Dublin, Ireland, pp. 21–23 (2017)

    Google Scholar 

  21. Pitkow, J.E.: Summary of WWW characterization. World Wide Web 2(1–2), 3–13 (1999)

    Article  Google Scholar 

  22. Pries, R., Magyari, Z., Tran-Gia, P.: An HTTP web traffic model based on the top one million visited web pages. In: Proceedings of the 8th Euro-NF Conference on Next Generation Internet (NGI), pp. 133–139 (2012)

    Google Scholar 

  23. Sanders, S., Sanka, G., Aikat, J., Kaur, J.: The influence of client platform on web page content: meas-urements, analysis, and implications. In: Web Information Systems Engineering – WISE 2015. Lecture Notes in Computer Science, Springer, Cham, pp. 1–16 (2015)

    Google Scholar 

  24. Sackl, A., Casas, P., Schatz, R., Janowski, L., Irmer, R.: Quantifying the impact of network bandwidth fluctuations and outages on web qoe. In: Seventh International Workshop on Quality of Multimedia Experience, QoMEX 2015, Pilos, Messinia, Greece, pp. 1–6 (2015)

    Google Scholar 

  25. Saverimoutou, A., Mathieu, B., Vaton. S.: A 6-month analysis of factors impacting web browsing quality for QoE prediction. In: Computer Networks, Elsevier (2019)

    Google Scholar 

  26. Seufert, M., Wehner, N., Casas, P.: Studying the impact of HAS qoe factors on the standardized qoe model P.1203. In: 38th IEEE International Conference on Distributed Computing Systems, Vienna, Austria, ICDCS (2018)

    Google Scholar 

  27. Williams, A., Arlitt, M., Williamson, C., Barker, K.: Web workload characterization: ten years later. In: Tang, X., Jianliang, X., Chanson, S.T. (eds.) Publish info Web content, pp. 3–22. Springer, New York (2005)

    Chapter  Google Scholar 

  28. Web Almanac By HTTP Archive: HTTP Archive’s annual state of the web report https://almanac.httparchive.org/en/2019/. Accessed 10 Oct 2020

  29. Website Performance, Webpages Are Getting Larger Every Year, and Here’s Why it Matters https://www.pingdom.com/blog/webpages-are-getting-larger-every-year-and-heres-why-it-matters/. Accessed 10 Oct 2020

  30. Yang, Y., Zhang, L., Maheshwari, R., Kahn, Z.A., Agarwal, D., Dubey, S.: A point of presence recommendation system using real user monitoring data. In: Passive and Active Measurement -17th International Conference, PAM 2016, Heraklion, Greece (2016)

    Google Scholar 

  31. Wget - package for retrieving files https://www.gnu.org/software/wget/. Accessed 1 Oct 2019

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zatwarnicki, K., Barton, S., Mainka, D. (2021). Acquisition and Modeling of Website Parameters. In: Barolli, L., Woungang, I., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2021. Lecture Notes in Networks and Systems, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-030-75078-7_59

Download citation

Publish with us

Policies and ethics