WWW Retrieval Handling Optimization : A metric for webpage timeout setting performance evaluation and comparison
Introduction
Recent years have seen the continuous proliferation of HTTP-based services into all domains of networked application scenarios. The Internet Protocol (IP) is considered the big unifier of networked data delivery following the “everything over IP and IP over everything” paradigm. Indeed, HTTP has been considered a second “hourglass” or “waist” of the modern internet [1], [2] for application and service delivery on top of the IP-based networked data delivery.
This is due to HTTP’s ubiquitous utilization ranging from a simple webobject for an HTML-based web page delivery over video streaming employing DASH [3] to desktop and mobile cross-platform application development [4]. With vast amounts of data being requested and transported employing web-based technologies, the optimization of the required bandwidth and optimizing delivery is of considerable importance. Commonly, reductions employ some approach to caching [5]. The benefits of potential optimizations in this regard typically cover the service and network providers through direct reductions of network traffic. Secondly, the reduction of network traffic through a reduction of webobject delivery requests can significantly aid in the subsequent reduction of content display latencies (which can lead to customer churn [6], [7]). An additional benefit can be realized by those employing mobile devices: A reduction in the amount of lower-layer frames will directly benefit the device users through the reduced network interface utilization and subsequent battery savings. As the most energy-efficient message is typically the one not needing to be sent, network and service providers would similarly be “greening” their operations [8]. The potential of attainable benefits typically is in close interplay with the timeout settings that are imposed on the communications partners from the server-side.
In our prior works, we coined the term Webobject Access Setting Timeout Excess (WASTE), see [9], which followed the principle of determining the amount of data that is redundantly downloaded by clients due to the enforced refreshes from the server-side. In those works, we quantified and evaluated the absolute impacts in terms of data amounts over time. A significant complex issue, however, is the nature of the underlying web sites and pages themselves. Web pages for different types of content offerings typically require different refreshes, as well as different amounts, of data. Consider the intuitive scenario of comparing a common web search landing page with minimal content display with a news web page that includes significant amounts of content, including several types of multimedia. A resulting problem is how to quantify and evaluate the performance of different web pages in this context and to derive a more comparable, universal metric.
We move closer to solving this problem with the metric (“www-rho”) that we describe here. While traditional metrics that target cached content (such as hit or miss ratios) have a single-dimensional view on data or webobject counts, our metric combines these two while additionally taking the time between requests into account. This novel approach allows for the comparison of webpages between different genres, which otherwise would be non-fair. We note that in turn, our metric is closer to the tuning of websites with respect to the webobjects constituting the page and less focused on a user perspective as, e.g., Google’s Speed Index [10]. Our metric is also not geared towards the complexity of a webpage, such as in [11], but rather towards making webpages of different complexity levels comparable with respect to how they handle their respective content over time. is based on information retrieval commonly employed in the data sciences. The metric exploits tunable, relative approaches to the determination of a web page’s optimization level. In this contribution, we derive the metric and evaluate its application across the landing pages of several popular web sites. can be employed by content developers and web server administrators alike to determine the performance of their web content delivery. For network providers, can be useful in future cache tuning approaches, which is beyond the scope of this contribution.
The remainder of this article is structured as follows. In the following section, we review related works and place our contribution into context, particularly with regard to the common caching proxy server approaches in use today. We describe our metric in Section 3 before we describe our general evaluation followed by a discussion of obtained results in Section 4. We specifically evaluate our metric with respect to its long-term stability and locality in Sections 4.4 , 4.5 , respectively. We continue with a discussion of usage and implications in Section 5 before we conclude with an outlook on future works in Section 6.
Section snippets
Related works
The focus of modern networking application scenarios on web-based approaches has resulted in HTTP becoming the de-facto upper layer hourglass waist in the modern network protocol stack [1]. This is accompanied by a significant push towards a unified front-end application development based around HTML5/CSS/JavaScript that leverages web technologies for device agnosticism. Caching content has emerged as an effective method to decrease the amount of data required to be delivered to fixed and
WWW Retrieval Handling Optimization metric
In this section, we describe and continue to follow the point of view of a caching web proxy server that performs interactions on requested webobjects. We note, however, that we do not consider other optimization approaches (e.g., proactive intermediate cache management or other efforts as outlined in [9]) to fix the overall idea of employing the WASTE concept for the determination of the efficiency of any generic webpage.
General results for
In this section, we discuss the obtained results for a measurement study conducted across a range of different web sites.
Implications and uses
Initially, the relative continuity that is showcased in Fig. 1 indicates that for the evaluated set of webpages, significant optimization potentials through modifying webobject timeouts exist. These, however have to be seen in conjuncture with the nature of a website, which brings the view to longer timescales, such as illustrated in Fig. 3. For example, returning to the CNN news webpage after about a day should, intuitively, result in significant content expiration and, thus, reduced
Conclusion
WWW Retrieval Handling Optimization provides a straight-forward indication of optimization potentials and enables the performance comparison of webobject retrievals across different types of webpages. Specifically, determines the optimization potentials of a webpage’s webobject retrieval (counts and sizes weighted by ) when re-visiting within seconds. For a sample of six popular webpages, we find that even for an immediate re-visit a major inefficiency in the configuration of
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Patrick Seeling is a Professor in the Department of Computer Science at Central Michigan University (Mount Pleasant, Michigan, USA). He received his Dipl.-Ing. Degree in Industrial Engineering and Management from the Technische Universityät Berlin (Berlin Institute of Technology, Berlin, Germany) in 2002 and his Ph.D. in Electrical Engineering from Arizona State University (Tempe, Arizona, USA) in 2005. He was a Faculty Research Associate and Associated Faculty with the Department of Electrical
References (28)
- et al.
Http as the narrow waist of the future internet
- et al.
Moving data in dtns with http and mime
Dynamic Adaptive Streaming over HTTP –: Standards and Design Principles, Standards and Design Principles
(2011)- et al.
Mobile application development: web vs. native
ACM Queue
(2011) - et al.
The role of caching in future communication systems and networks
IEEE J. Sel. Areas Commun.
(2018) - et al.
Survey of research on quality of experience modelling for web browsing
Qual. User Exp.
(2017) - et al.
A survey on modeling of human states in communication behavior
IEICE Trans.
(2017) - et al.
Electricity intensity of internet data transmission: Untangling the estimates
J. Ind. Ecol.
(2018) - et al.
Forward caching hints to reduce waste
- et al.
Speed index: Relating the industrial standard for user perceived web performance to web qoe
Characterizing web page complexity and its impact
IEEE/ACM Trans. Netw.
Fundamental limits of caching
IEEE Trans. Inform. Theory
Who killed my battery?: analyzing mobile browser energy consumption
Greening the internet: Measuring web power consumption
IT Prof.
Cited by (0)
Patrick Seeling is a Professor in the Department of Computer Science at Central Michigan University (Mount Pleasant, Michigan, USA). He received his Dipl.-Ing. Degree in Industrial Engineering and Management from the Technische Universityät Berlin (Berlin Institute of Technology, Berlin, Germany) in 2002 and his Ph.D. in Electrical Engineering from Arizona State University (Tempe, Arizona, USA) in 2005. He was a Faculty Research Associate and Associated Faculty with the Department of Electrical Engineering at Arizona State University from 2005 to 2007. From 2008 to 2011, he was an Assistant Professor in the Department of Computing and New Media Technologies at the University of Wisconsin-Stevens Point (Stevens Point, Wisconsin, USA). In 2011, he joined Central Michigan University as Assistant Professor, where he became a tenured Associate Professor in 2015 and a Full Professor in 2018.
Patrick Seeling’s research interests comprise next-generation networking and the Tactile Internet, resulting user experiences in mixed realities, combinations thereof in distributed and mobile systems, and computer-mediated education. He has published over 100 journal articles and conference papers, as well as books, book chapters, and tutorials. He actively contributes to the profession as editorial board member, reviewer and programme committee member for several journals and conferences. Patrick Seeling is a Senior Member of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE).