Abstract
During the last 30 years, the web has evolved from simple information HTML pages to complex applications supporting business, television, newspapers, entertainment, and others. While there are many articles on website popularity, there has been little work in understanding the complexity of individual web pages. In the article, we present a measurement-driven study of the complexity of web pages today. We measured 426 866 web pages in about 12 weeks. Our study is devoted to two problems. The first problem was to describe the complexity of a web page with metrics based on the content they included and the kind of service they offered. The second focus of our study was to build probabilistic models of observed distributions. Such models can be used in HTTP request generators modelling the work of modern web systems. Separate models are proposed for each category of web pages and all pages together.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alexa – traffic analysis company. https://www.alexa.com/. Accessed 10 Oct 2020
Arlitt, M.F., Friedrich, R., Jin, T.: Workload characterization of a Web proxy in a cable modem environment. ACM Performance Eval. Rev. 27(2), 25–36 (1999)
Arvidsson, A., Grinnemo, K., Chen, E., Wang, Q., Brunstrom, A.: Web metrics for the next generation performance enhancing proxies. In: 2019 International Conference on Software, Telecommunications and Computer Networks (SoftCOM), Split, Croatia, pp. 1–6 (2019)
Barford P., Bestavros, A., Bradley, A., Crovella, M.: Changes in web client access patterns, characteristics and caching implications In: Special Issue on World Wide Web Characterization and Performance Evaluation; World Wide Web Journal, (1998)
Barford, P., Misra, V.: Measurement. IMA Workshop on Internet Modeling and Analysis, Minneapolis, MN, January, Modeling and Analysis of the Internet (2004)
Broder, A., et al.: Graph structure in the web. Comput. Networks 33(1), 309–320 (2000)
Butkiewicz, M., Madhyastha, H.V., Sekar, V.: Understanding website complexity: measurements, metrics, and implications. In: Proceedings 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 313–328 (2011)
Cherkasova, L., Karlsson, M.: Dynamics and evolution of web sites: analysis, metrics and design issues. In: Proceedings of the 6th IEEE Symposium on Computers and Communications, Hammamet, Tunisia, 64–71 (2001)
Cook, S., Mathieu, B., Truong, P., Hamchaoui, I.: QUIC: better for what and for whom? In: Proceedings of IEEE International Conference on Communications (ICC) (2017)
Crovella, M.E., Bestavros, A.: SelfSimilarity in World Wide Web traffic evidence and possible causes. In: SIGMETRICS 1996, USA, Philadelphia (1996)
Everts, T.: The average web page is 3MB. How much should we care? https://speedcurve.com/blog/web-performance-page-bloat/. Accessed 10 Oct 2020
Fetterly, D., Manasse, M., Najork, M., Wiener, J.: A large-scale study of the evolution of web pages. Softw. Practice and Experience 34(2), 213–237 (2004)
Mendes, J., Laranjeiro, N., Vieira, M.: Toward characterizing HTML defects on the Web. Software Practice and Experience 48(1), 750–757 (2018)
Hernandez-Campos, F., Jeffay, K., Donelson-Smith F.: Tracking the evolution of web traffic: 1995–2003. In: Proceedings of 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS), pp. 16–25 (2003)
Ihm, S., Pai, V.S.: Towards understanding modern web traffic. In: 2011 ACM SIGCOMM Internet Measurement Conference, pp. 295–312 (2011)
Johnson, T., Seeling, P.: Landing page characteristics model for mobile web performance evaluations on object and page levels. In: 2015 IEEE International Conference on Communications (ICC), pp. 3616–3621 (2015)
Kleinberg, J.M., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: The web as a graph: measurements, models and methods. In: Proceedings COCOON (1999)
Knox, A., Seeling, P.: Mobile web page characteristics: delivery and stability considerations. In: 2017 14th IEEE Annual Consumer Communications Networking Conference (CCNC), pp. 37–40 (2017)
Lychev, R., Jero, S., Boldyreva, A., Nita-Rotaru, C.: How Secure and Quick is QUIC? Provable Security and Performance Analyses. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), (2015)
Manzoor, J., Drago, I., Sadre, R.: How HTTP/2 is changing web traffic and how to detect it. In: Network Traffic Measurement and Analysis Conference, TMA 2017, Dublin, Ireland, pp. 21–23 (2017)
Pitkow, J.E.: Summary of WWW characterization. World Wide Web 2(1–2), 3–13 (1999)
Pries, R., Magyari, Z., Tran-Gia, P.: An HTTP web traffic model based on the top one million visited web pages. In: Proceedings of the 8th Euro-NF Conference on Next Generation Internet (NGI), pp. 133–139 (2012)
Sanders, S., Sanka, G., Aikat, J., Kaur, J.: The influence of client platform on web page content: meas-urements, analysis, and implications. In: Web Information Systems Engineering – WISE 2015. Lecture Notes in Computer Science, Springer, Cham, pp. 1–16 (2015)
Sackl, A., Casas, P., Schatz, R., Janowski, L., Irmer, R.: Quantifying the impact of network bandwidth fluctuations and outages on web qoe. In: Seventh International Workshop on Quality of Multimedia Experience, QoMEX 2015, Pilos, Messinia, Greece, pp. 1–6 (2015)
Saverimoutou, A., Mathieu, B., Vaton. S.: A 6-month analysis of factors impacting web browsing quality for QoE prediction. In: Computer Networks, Elsevier (2019)
Seufert, M., Wehner, N., Casas, P.: Studying the impact of HAS qoe factors on the standardized qoe model P.1203. In: 38th IEEE International Conference on Distributed Computing Systems, Vienna, Austria, ICDCS (2018)
Williams, A., Arlitt, M., Williamson, C., Barker, K.: Web workload characterization: ten years later. In: Tang, X., Jianliang, X., Chanson, S.T. (eds.) Publish info Web content, pp. 3–22. Springer, New York (2005)
Web Almanac By HTTP Archive: HTTP Archive’s annual state of the web report https://almanac.httparchive.org/en/2019/. Accessed 10 Oct 2020
Website Performance, Webpages Are Getting Larger Every Year, and Here’s Why it Matters https://www.pingdom.com/blog/webpages-are-getting-larger-every-year-and-heres-why-it-matters/. Accessed 10 Oct 2020
Yang, Y., Zhang, L., Maheshwari, R., Kahn, Z.A., Agarwal, D., Dubey, S.: A point of presence recommendation system using real user monitoring data. In: Passive and Active Measurement -17th International Conference, PAM 2016, Heraklion, Greece (2016)
Wget - package for retrieving files https://www.gnu.org/software/wget/. Accessed 1 Oct 2019
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zatwarnicki, K., Barton, S., Mainka, D. (2021). Acquisition and Modeling of Website Parameters. In: Barolli, L., Woungang, I., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2021. Lecture Notes in Networks and Systems, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-030-75078-7_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-75078-7_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75077-0
Online ISBN: 978-3-030-75078-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)