Performance improvement of web caching in Web 2.0 via knowledge discovery
Introduction
Content aggregation systems (CAS) are Web 2.0 applications in which users are able to create their own web pages by the aggregation of contents. The web pages retrieve content from distributed sources and assemble it in a single web page. These applications differ from other content management systems (CMS) in that users do not create the content, they only set up the web pages by aggregating content from public services and remote sources (web services, RSS, etc.). The web application retrieves independent content from these sources (content elements, CEs).
This type of web pages has high update rates because the content is retrieved from different sources and the web page needs to be updated every time a single source changes. They have also a very high customizable degree.
There are many examples of web applications that fit in CAS: social networks, web blogs, feed aggregation tools, etc. We focused our study on personal start-pages (Yahoo! Pipes, iGoogle, Netvibes and PageFlakes).
Depending on the system tier where the assemblies of the content elements take place, the cache is able to manage and to store whole and indivisible web pages, or otherwise, content elements. When the assembling process is done in the web proxy cache, the cache is able to store content elements (CEs) independently. This benefits the hit ratio of the web cache (only the invalidated content elements are requested to the web server), but it worsens the overhead times of assembling all the content elements. On the other hand, when the assemblies take place in the web application server, if one content element is invalidated, the whole web page is also invalidated and the cache needs to request to the server all the contents of the web page. This reduces the hit ratio, but only one server request is generated so the overhead times corresponding to server connections and assemblies are shorter.
Our research was focused on creating intermediate schemes in which some of the content elements were assembled in the application server and other ones in the web proxy cache. Thus, the increases of the hit ratios with the losses in the overhead times were balanced, obtaining shorter user-perceived latencies. Our hypothesis was that this adaptation can be done with decision trees created in a previous and off-line knowledge discovery process. We suggested to obtain this knowledge using the characteristics of the contents and the performance results of an emulation of synthetic content models. The contributions of this research work are: (a) Definition of a framework to adapt the content fragment elements of a web page to reduce the user-perceived latency in content aggregation systems. (b) Deployment of the adaptive core of the framework using knowledge discovery techniques by mining performance data obtained in an off-line process using synthetic content models. (c) Experimental validation of the proposed framework and the use of knowledge discovery in a real system with contents of a real website.
Section snippets
Background and related work
Systems in which the contents have high update rates and high customizable levels reduce the performance of web caching techniques. This is the main drawback to apply caching in current Web 2.0 systems and the main motivation of our research.
It is well-known that a solution for the problem of web caching in Web 2.0 systems is to reduce the minimum cacheable unit (Yuan et al., 2003). The cache is able to manage fragments of the web pages instead of complete web pages. In this type of systems,
Proposed approach
COFRADIAS framework (COntent FRagment ADaptation In web Aggregation Systems) is our proposal which extends the basic scheme of a content aggregation system including an adaptive core. COFRADIAS framework (Fig. 2) is not only the definition of the interfaces between the new adaptive core and the tiers of a traditional CAS. It is also the design solution taken in order to adapt the tiers of a CAS system to the new type of elements: the content fragments.
The interaction between the proxy and the
Validation
The validation of our approach is based on comparing the performance results and the overhead of our solution with a traditional web cache scheme in content aggregation systems and with solutions proposed in other researches that address our same problem. The results of our approach are compared with one of the traditional web cache schemes for CAS systems and with the research of Hassan et al. (2010).
In relation with traditional web cache schemes, we have proved in previous results (Guerrero
Conclusions and future work
The work presented in this article was addressed to improve the user-perceived latency in Web 2.0 systems based on the aggregation of content from remote sources. The improvement was based on the creation of fragments of the web pages which can be managed independently by the web cache. The content fragment design had to balance the overhead time losses of a fragment with a big number of fragments and the improvement of the hit ratio under the same conditions.
We defined COFRADIAS framework to
Acknowledgments
This work has been financed by the Spanish Ministry of Education and Science through the TIN11-23889 project
Dr. Carlos Guerrero is an assistant professor of computer architecture and technology at the Computer Science Department of the University of the Balearic Islands. His research interests are web performance, web engineering, web applications, data mining and intelligent systems. He has taken part in seven research projects (national and international) and has published around 12 papers in different international conferences and journals. He has been member of the program committee of several
References (26)
- et al.
Mining web logs to improve hit ratios of prefetching and caching
Knowledge-Based Systems
(2008) - et al.
A new approach for a proxy-level web caching mechanism
Decision Support Systems
(2008) - et al.
A clustering-based prefetching scheme on a web cache environment
Computers and Electrical Engineering
(2008) - et al.
Characterization and analysis of user profiles in online video sharing systems
Journal of Information and Data Management
(2010) - et al.
Characterizing user behavior in online social networks
- et al.
A fragment-based approach for efficiently creating dynamic web content
ACM Transactions on Internet Technologies
(2005) - et al.
Proxy-based acceleration of dynamically generated content on the world wide web: an approach and implementation
- et al.
Youtube traffic characterization: a view from the edge
- et al.
The applicability of balanced ESI for web caching – a proposed algorithm and a case of study
- et al.
Rule-based system to improve performance on mash-up web applications
Evaluation of a fragment-optimized content aggregation web system
Improving web cache performance via adaptive content fragmentation design
The MACE approach for caching mashups
International Journal of Web Services Research
Cited by (6)
A lightweight decentralized service placement policy for performance optimization in fog computing
2019, Journal of Ambient Intelligence and Humanized ComputingA Qualitative Study of Application-Level Caching
2017, IEEE Transactions on Software EngineeringApproach to performance optimization of mashup operation
2015, Ruan Jian Xue Bao/Journal of SoftwareExtending JMeter to allow for web structure mining
2013, Proceedings of the 5th International Conference on Internet Technologies and Applications, ITA 2013
Dr. Carlos Guerrero is an assistant professor of computer architecture and technology at the Computer Science Department of the University of the Balearic Islands. His research interests are web performance, web engineering, web applications, data mining and intelligent systems. He has taken part in seven research projects (national and international) and has published around 12 papers in different international conferences and journals. He has been member of the program committee of several international conferences.
Dr. Isaac Lera is an associate lecturer at the Computer Science Department of the University of the Balearic Islands. Along his research activity, he has participated in different local, national and international projects and he has published around 10 articles in conferences and journals. He is interested in topics such as: system performance, web services, semantic representations, and ontology engineering.
Dr. Carlos Juiz is an associate professor at the University of the Balearic Islands (UIB), Spain. He is co-author of more than 150 papers, published reviews and book chapters. He is senior member of the IEEE and senior member of the ACM, and also member of the Steering Committee of the Workshop of Software and Performance. He organized the international Workshop on Middleware and Performance (WOMP 2006) and the European Performance Engineering Workshop (EPEW 2008). He is also member of ARTEMISIA (Advanced Research & Technology for Embedded Intelligent Systems Industrial Association) and member of the standardization committee of NESSI (Networked European Software and Services Initiative). Carlos Juiz is an invited expert of the International Telecommunications Union (ITU).