On demand synchronization and load distribution for database grid-based Web applications

https://doi.org/10.1016/j.datak.2004.05.003Get rights and content

Abstract

With the availability of content delivery networks (CDN), many database-driven Web applications rely on data centers that host applications and database contents for better performance and higher reliability. However, it raises additional issues associated with database/data center synchronization, query/transaction routing, load balancing, and application result correctness/precision. In this paper, we investigate the issues in the context of data center synchronization for such load and precision critical Web applications in a distributed data center infrastructure. We develop a scalable scheme for adaptive synchronization of data centers to maintain the load and application precision requirements. A prototype has been built for the evaluation of the proposed scheme. The experimental results show the effectiveness of the proposed scheme in maintaining both application result precision and load distribution; adapting to traffic patterns and system capacity limits.

Introduction

Applying acceleration solutions for Web applications and content distribution has received a lot of attention in the Web and database communities. At the same time, many database and application server vendors are beginning to integrate Web acceleration through data caching in their software. Examples include Oracle 9i [13] which features a suite of application server, Web server, and data cache for deployment at data centers for accelerating the delivery of dynamic content. On the other hand, Akamai [6] extended the concept of content distribution network (CDN) to application distribution network (ADN) by caching and executing stand alone and self-contained J2EE-based applications (e.g. EJB) at edge servers.

Wide-area database replication technologies and availability of data centers allow database copies to be distributed across the network. The goal of this approach is to offset the high cost of replica synchronization by moving data closer to the users (similar to caching in which data is moved closer to the users reducing network latency). Such data center architectures make it more flexible to deploy a “distributed” Web site. However, the system architecture of data center-hosted Web applications has many drawbacks that prevent it from being considered for many e-commerce applications, in which content freshness, request response time, application result precision, and transaction throughput need to be maintained at a very high level and sometimes they need to be guaranteed to be within an acceptable threshold specified as a part of QoS agreement.

In this paper, we investigate the issues of data center synchronization and query routing for load and precision critical Web applications. The complexity arises when all of these load and precision sensitive requirements need to be met in all data centers as well as estimating application errors in response to database content changes. We develop an adaptive data center synchronization technique to effectively coordinate data center synchronization and request routing for load balancing. We have extensively tested our technique. The experiments show the effectiveness and adaptiveness of our solution in maintaining both application result precisions and response time for transactions as a part of QoS specification. We also conducted evaluation on the scalability of our algorithm on an application distribution network (ADN) with 15 data centers and the experimental results show that our algorithm is able to schedule just in time synchronizations and dynamic load distribution even when the load reaches close to 90% of overall system capacity.

Compared with most existing work that focuses on either minimizing response time or achieving QoS (i.e. Quality of Service) for response time, this paper focuses on achieving the following goals at the same time:

  • QoS for response time through load distribution; and

  • QoS for application result error threshold through on demand synchronization.


The rest of the paper is organized as follows: In Section 2, we describe the current system architecture of data center-hosted applications and point out their drawbacks. In Section 3, we present our solution to these drawbacks on top of existing data center architecture for supporting load and precision sensitive Web applications. In Section 4, we describe how to estimate application errors. In Section 5, we give details of the proposed adaptive data center synchronization scheme. In Section 6, we present the results of experiments and give our analysis. In Section 9, we summarize related work in the field and compare them with our work. Finally, we give our concluding remarks in Section 10.

Section snippets

System architecture of data center-hosted Web applications

Wide-area database replication technology and availability of data centers allows database copies to be distributed across the network. The goal of this approach is to reduce network latency and bandwidth usage by moving content closer to the users which in turn will offset the high cost of replica synchronization. A data center-hosted Web application requires a complete e-commerce Web site suite: Web server (WS), application server (AS), and database management system (DBMS) to be distributed

Adaptive data center synchronization

To address the drawbacks of the existing data center architecture and to support response time and precision sensitive applications, we propose an architecture that aims at building an efficient solution for synchronizing load and precision critical Web applications on top of an existing commercially available data center-hosted Web application system architecture.

In supporting content sensitive applications, each application is expected to specify the error bound threshold within which

Estimating precision of application results based on statistics

In order to correctly schedule our updates, we need to predict the difference between the actual query result (Qa) (i.e. the result of the query that is run on the master) and the current query result (Qc) that is evaluated using the tables stored on the data center. In order to achieve these goals, we investigate various approaches and evaluate them experimentally. We describe two major approaches.

Adaptive synchronization of data centers and load distribution

In Section 3, we presented the system architecture for data center-hosted applications with the deployment of the proposed adaptive data center synchronization to ensure response time and application precision guarantees. In this section, we describe the adaptive data center synchronization and load distribution techniques in more detail.

Experimental evaluations

We have conducted a comprehensive set of experiments to evaluate the proposed system architecture and the Adaptive Load and Precision Critical Scheduler (ALPCS) algorithm. In this section, we describe the experimental results. We start with the settings for the experiments.

Comparisons with other algorithms

The round robin algorithm is one of most frequently deployed scheme for data center synchronization. We observe that the round robin type of data center synchronization schemes are suitable when the variability in the loads, applications capacity, load limits, and other parameters at data centers are low. In particular, the round robin type of data center synchronization schemes do not work when the loads, applications capability, and load limits of data centers are diverge and change

Evaluation of scalability

In addition to the experiments in 6 Experimental evaluations, 7 Comparisons with other algorithms, we also conducted an evaluation of our algorithm on the aspect of scalability in scheduling a large number of data centers; especially when the user requests and transactions are close to the overall system capacity. In this section, we describe the experimental results to test on a large scale application delivery network (ADN) of 15 data centers and loads close to 90% of its overall capacity. We

Related work

The update scheduling problem in load and precision sensitive systems can be compared to the problem of scheduling tasks on a single machine in order to minimize the weighted completion time under precedence constrains [10], which has been proved to be NP-hard for the general case [21].

In [20], the problem of updating Web caches when their back-end databases receive updates is investigated. At the conceptual level, we can view the DCs as Web caches but there is no parallel for the error bounds

Conclusion

Wide-area database replication and CDN technologies make data center an ideal choice to host and run database-driven Web applications that demand response time and reliability. We point out that such an architecture lacks capability to support many content sensitive Web-based applications, such as applications for financial markets, corporate cash management, supply chain and inventory management for large distributed retail chain stores, and decision making systems. We propose a new system

Acknowledgements

The authors would like to acknowledge the contribution of Ullas Nambiar in the early version of this work.

Wen-Syan Li is a Senior Research Staff Member at NEC Laboratories America, Inc. He received his Ph.D. in Computer Science from Northwestern University in December 1995. He also holds an MBA degree. His main research interests include content delivery network, multimedia/hypermedia/document databases, WWW, e-commerce, and information retrieval. Wen-Syan is the recipient of the first NEC USA Achievement Award for his contributions in technology innovation.

References (30)

  • R.L. Carter et al.

    On the network impact of dynamic server selection

    Computer Networks

    (1999)
  • E.L. Lawler

    Sequencing jobs to minimize total weighted completion time

    Annals of Discrete Mathematics

    (1978)
  • A. Heddaya, S. Mirdad, D. Yates, Diffusion-based caching: WebWave, in: Proceedings of the 1997 NLANR Web Caching...
  • M.R. Korupolu, M. Dahlin, Coordinated placement and replacement for large-scale distributed caches, in: Proceedings of...
  • A. Heddaya, S. Mirdad, WebWave: globally load balanced fully distributed caching of hot published documents, in:...
  • B. Adelberg, H. Garcia-Molina, B. Kao, Applying update streams in a soft real-time database system, in: Proceedings of...
  • B. Adelberg, B. Kao, H. Garcia-Molina, Database support for efficiently maintaining derived data, in: Proceedings of...
  • Akamai Technology. Information available at...
  • M. Altinel, C. Bornhoevd, S. Krishnamurthy, C. Mohan, H. Pirahesh, B. Reinwald, Cache tables: paving the way for an...
  • C. Bornhövd, M. Altinel, S. Krishnamurthy, C. Mohan, H. Pirahesh, B. Reinwald, Dbcache: middle-tier database caching...
  • C. Chekuri et al.

    Precedence constrained scheduling to minimize weighted completion time

    Discrete Applied Mathematics

    (1999)
  • Y. Chen, R.H. Katz, J.D. Kubiatowicz, Dynamic replica placement for scalable content delivery, in: Peer-to-Peer...
  • J. Cho, H. Garcia-Molina, Synchronizing a database to improve freshness, in: Proceedings of ACM SIGMOD Conference,...
  • Oracle Corp. Available from...
  • M.J. Franklin et al.

    Transactional client-server cache consistency: alternatives and performance

    ACM Transactions on Database Systems

    (1997)
  • Cited by (7)

    View all citing articles on Scopus

    Wen-Syan Li is a Senior Research Staff Member at NEC Laboratories America, Inc. He received his Ph.D. in Computer Science from Northwestern University in December 1995. He also holds an MBA degree. His main research interests include content delivery network, multimedia/hypermedia/document databases, WWW, e-commerce, and information retrieval. Wen-Syan is the recipient of the first NEC USA Achievement Award for his contributions in technology innovation.

    Kemal Altintas is a Ph.D. student in Information and Computer Science Department, University of California, Irvine. He received his BS and MS degrees in Computer Science from Bilkent University, Turkey. His research interests include topics on data and information management systems, information retrieval and extraction from text and speech and natural language processing.

    Murat Kantarcıoǧlu is a Ph.D. candidate at Purdue University. He has a Master's degree in Computer Science from Purdue University and a Bachelor's degree in Computer Engineering from Middle East Technical University, Ankara Turkey. His research interests include data mining, database security and information security. He is a student member of ACM.

    1

    This work was performed when the author was with NEC Laboratories America, Inc. He is currently affiliated with IBM Almaden Research Center and can be reached at [email protected].

    View full text