Elsevier

Future Generation Computer Systems

Volume 110, September 2020, Pages 1055-1066
Future Generation Computer Systems

WWW Retrieval Handling Optimization wρ3: A metric for webpage timeout setting performance evaluation and comparison

https://doi.org/10.1016/j.future.2019.11.023Get rights and content

Highlights

  • A webpage’s webobjects often time out prematurely, forcing non-required downloads.

  • This paper introduces the wρ3 metric and provides its general utilization overview.

  • wρ3 is based on information retrieval approaches for real-world content.

  • wρ3 can be employed to determine a webpage’s performance optimization potentials.

  • wρ3 homogenizes comparisons between different types of websites.

Abstract

The rise of HTTP-based communications and the employment of HTML5 for cross-platform application development gives rise to a continuous increase of traffic over all types of networks and a continuous demand for caching of content. However, a remaining underlying problem is the timeout of resources, typically configured server-side and dependent on the type of web site that offers content. With common web server implementations, individual webobjects constituting the overall content cannot be readily configured independently, which results in premature timeouts of webobjects with undesired consequences on download times, rendering speed, and energy needed.

We present and evaluate WWW Retrieval Handling Optimization wρ3, which provides optimization potentials for webpages. wρ3 can also be used to derive hints for fine-tuning the performance of the server-side configurations and to compare the performance of timeouts between web sites of different categories. Our metric features a tunable preference between counts of webobjects and the amount of data required and is a function of the re-visiting time. We showcase the applicability of wρ3 over a range of different web sites and categories, additionally evaluating the applicability over time horizons and worldwide request locations. The overarching flexibility and ease of use together with its overall performance evaluation results make wρ3 an ideal candidate as a generic web site performance measurement tool.

Introduction

Recent years have seen the continuous proliferation of HTTP-based services into all domains of networked application scenarios. The Internet Protocol (IP) is considered the big unifier of networked data delivery following the “everything over IP and IP over everything” paradigm. Indeed, HTTP has been considered a second “hourglass” or “waist” of the modern internet [1], [2] for application and service delivery on top of the IP-based networked data delivery.

This is due to HTTP’s ubiquitous utilization ranging from a simple webobject for an HTML-based web page delivery over video streaming employing DASH [3] to desktop and mobile cross-platform application development [4]. With vast amounts of data being requested and transported employing web-based technologies, the optimization of the required bandwidth and optimizing delivery is of considerable importance. Commonly, reductions employ some approach to caching [5]. The benefits of potential optimizations in this regard typically cover the service and network providers through direct reductions of network traffic. Secondly, the reduction of network traffic through a reduction of webobject delivery requests can significantly aid in the subsequent reduction of content display latencies (which can lead to customer churn [6], [7]). An additional benefit can be realized by those employing mobile devices: A reduction in the amount of lower-layer frames will directly benefit the device users through the reduced network interface utilization and subsequent battery savings. As the most energy-efficient message is typically the one not needing to be sent, network and service providers would similarly be “greening” their operations [8]. The potential of attainable benefits typically is in close interplay with the timeout settings that are imposed on the communications partners from the server-side.

In our prior works, we coined the term Webobject Access Setting Timeout Excess (WASTE), see [9], which followed the principle of determining the amount of data that is redundantly downloaded by clients due to the enforced refreshes from the server-side. In those works, we quantified and evaluated the absolute impacts in terms of data amounts over time. A significant complex issue, however, is the nature of the underlying web sites and pages themselves. Web pages for different types of content offerings typically require different refreshes, as well as different amounts, of data. Consider the intuitive scenario of comparing a common web search landing page with minimal content display with a news web page that includes significant amounts of content, including several types of multimedia. A resulting problem is how to quantify and evaluate the performance of different web pages in this context and to derive a more comparable, universal metric.

We move closer to solving this problem with the wρ3metric (“www-rho”) that we describe here. While traditional metrics that target cached content (such as hit or miss ratios) have a single-dimensional view on data or webobject counts, our metric combines these two while additionally taking the time between requests into account. This novel approach allows for the comparison of webpages between different genres, which otherwise would be non-fair. We note that in turn, our metric is closer to the tuning of websites with respect to the webobjects constituting the page and less focused on a user perspective as, e.g., Google’s Speed Index [10]. Our metric is also not geared towards the complexity of a webpage, such as in [11], but rather towards making webpages of different complexity levels comparable with respect to how they handle their respective content over time. wρ3is based on information retrieval commonly employed in the data sciences. The metric exploits tunable, relative approaches to the determination of a web page’s optimization level. In this contribution, we derive the wρ3metric and evaluate its application across the landing pages of several popular web sites. wρ3can be employed by content developers and web server administrators alike to determine the performance of their web content delivery. For network providers, wρ3can be useful in future cache tuning approaches, which is beyond the scope of this contribution.

The remainder of this article is structured as follows. In the following section, we review related works and place our contribution into context, particularly with regard to the common caching proxy server approaches in use today. We describe our wρ3metric in Section 3 before we describe our general evaluation followed by a discussion of obtained results in Section 4. We specifically evaluate our metric with respect to its long-term stability and locality in Sections 4.4 , 4.5 , respectively. We continue with a discussion of usage and implications in Section 5 before we conclude with an outlook on future works in Section 6.

Section snippets

Related works

The focus of modern networking application scenarios on web-based approaches has resulted in HTTP becoming the de-facto upper layer hourglass waist in the modern network protocol stack [1]. This is accompanied by a significant push towards a unified front-end application development based around HTML5/CSS/JavaScript that leverages web technologies for device agnosticism. Caching content has emerged as an effective method to decrease the amount of data required to be delivered to fixed and

WWW Retrieval Handling Optimization metric wρ3

In this section, we describe wρ3and continue to follow the point of view of a caching web proxy server that performs interactions on requested webobjects. We note, however, that we do not consider other optimization approaches (e.g., proactive intermediate cache management or other efforts as outlined in [9]) to fix the overall idea of employing the WASTE concept for the determination of the efficiency of any generic webpage.

General results for wρ3

In this section, we discuss the obtained results for a measurement study conducted across a range of different web sites.

Implications and uses

Initially, the relative continuity that is showcased in Fig. 1 indicates that for the evaluated set of webpages, significant optimization potentials through modifying webobject timeouts exist. These, however have to be seen in conjuncture with the nature of a website, which brings the view to longer timescales, such as illustrated in Fig. 3. For example, returning to the CNN news webpage after about a day should, intuitively, result in significant content expiration and, thus, reduced

Conclusion

WWW Retrieval Handling Optimization wρ3provides a straight-forward indication of optimization potentials and enables the performance comparison of webobject retrievals across different types of webpages. Specifically, w3ρ(t,τ,α) determines the optimization potentials of a webpage’s webobject retrieval (counts and sizes weighted by α) when re-visiting within τ seconds. For a sample of six popular webpages, we find that even for an immediate re-visit a major inefficiency in the configuration of

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Patrick Seeling is a Professor in the Department of Computer Science at Central Michigan University (Mount Pleasant, Michigan, USA). He received his Dipl.-Ing. Degree in Industrial Engineering and Management from the Technische Universityät Berlin (Berlin Institute of Technology, Berlin, Germany) in 2002 and his Ph.D. in Electrical Engineering from Arizona State University (Tempe, Arizona, USA) in 2005. He was a Faculty Research Associate and Associated Faculty with the Department of Electrical

References (28)

  • PopaL. et al.

    Http as the narrow waist of the future internet

  • WoodL. et al.

    Moving data in dtns with http and mime

  • StockhammerT.

    Dynamic Adaptive Streaming over HTTP –: Standards and Design Principles, Standards and Design Principles

    (2011)
  • CharlandA. et al.

    Mobile application development: web vs. native

    ACM Queue

    (2011)
  • PaschosG.S. et al.

    The role of caching in future communication systems and networks

    IEEE J. Sel. Areas Commun.

    (2018)
  • BarakovićS. et al.

    Survey of research on quality of experience modelling for web browsing

    Qual. User Exp.

    (2017)
  • NiidaS. et al.

    A survey on modeling of human states in communication behavior

    IEICE Trans.

    (2017)
  • AslanJ. et al.

    Electricity intensity of internet data transmission: Untangling the estimates

    J. Ind. Ecol.

    (2018)
  • BaileyM.J. et al.

    Forward caching hints to reduce waste

  • HossfeldT. et al.

    Speed index: Relating the industrial standard for user perceived web performance to web qoe

  • ButkiewiczM. et al.

    Characterizing web page complexity and its impact

    IEEE/ACM Trans. Netw.

    (2014)
  • Maddah-AliM.A. et al.

    Fundamental limits of caching

    IEEE Trans. Inform. Theory

    (2014)
  • ThiagarajanN. et al.

    Who killed my battery?: analyzing mobile browser energy consumption

  • BianzinoA.P. et al.

    Greening the internet: Measuring web power consumption

    IT Prof.

    (2011)
  • Cited by (0)

    Patrick Seeling is a Professor in the Department of Computer Science at Central Michigan University (Mount Pleasant, Michigan, USA). He received his Dipl.-Ing. Degree in Industrial Engineering and Management from the Technische Universityät Berlin (Berlin Institute of Technology, Berlin, Germany) in 2002 and his Ph.D. in Electrical Engineering from Arizona State University (Tempe, Arizona, USA) in 2005. He was a Faculty Research Associate and Associated Faculty with the Department of Electrical Engineering at Arizona State University from 2005 to 2007. From 2008 to 2011, he was an Assistant Professor in the Department of Computing and New Media Technologies at the University of Wisconsin-Stevens Point (Stevens Point, Wisconsin, USA). In 2011, he joined Central Michigan University as Assistant Professor, where he became a tenured Associate Professor in 2015 and a Full Professor in 2018.

    Patrick Seeling’s research interests comprise next-generation networking and the Tactile Internet, resulting user experiences in mixed realities, combinations thereof in distributed and mobile systems, and computer-mediated education. He has published over 100 journal articles and conference papers, as well as books, book chapters, and tutorials. He actively contributes to the profession as editorial board member, reviewer and programme committee member for several journals and conferences. Patrick Seeling is a Senior Member of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE).

    View full text