On demand synchronization and load distribution for database grid-based Web applications
Introduction
Applying acceleration solutions for Web applications and content distribution has received a lot of attention in the Web and database communities. At the same time, many database and application server vendors are beginning to integrate Web acceleration through data caching in their software. Examples include Oracle 9i [13] which features a suite of application server, Web server, and data cache for deployment at data centers for accelerating the delivery of dynamic content. On the other hand, Akamai [6] extended the concept of content distribution network (CDN) to application distribution network (ADN) by caching and executing stand alone and self-contained J2EE-based applications (e.g. EJB) at edge servers.
Wide-area database replication technologies and availability of data centers allow database copies to be distributed across the network. The goal of this approach is to offset the high cost of replica synchronization by moving data closer to the users (similar to caching in which data is moved closer to the users reducing network latency). Such data center architectures make it more flexible to deploy a “distributed” Web site. However, the system architecture of data center-hosted Web applications has many drawbacks that prevent it from being considered for many e-commerce applications, in which content freshness, request response time, application result precision, and transaction throughput need to be maintained at a very high level and sometimes they need to be guaranteed to be within an acceptable threshold specified as a part of QoS agreement.
In this paper, we investigate the issues of data center synchronization and query routing for load and precision critical Web applications. The complexity arises when all of these load and precision sensitive requirements need to be met in all data centers as well as estimating application errors in response to database content changes. We develop an adaptive data center synchronization technique to effectively coordinate data center synchronization and request routing for load balancing. We have extensively tested our technique. The experiments show the effectiveness and adaptiveness of our solution in maintaining both application result precisions and response time for transactions as a part of QoS specification. We also conducted evaluation on the scalability of our algorithm on an application distribution network (ADN) with 15 data centers and the experimental results show that our algorithm is able to schedule just in time synchronizations and dynamic load distribution even when the load reaches close to 90% of overall system capacity.
Compared with most existing work that focuses on either minimizing response time or achieving QoS (i.e. Quality of Service) for response time, this paper focuses on achieving the following goals at the same time:
- •
QoS for response time through load distribution; and
- •
QoS for application result error threshold through on demand synchronization.
The rest of the paper is organized as follows: In Section 2, we describe the current system architecture of data center-hosted applications and point out their drawbacks. In Section 3, we present our solution to these drawbacks on top of existing data center architecture for supporting load and precision sensitive Web applications. In Section 4, we describe how to estimate application errors. In Section 5, we give details of the proposed adaptive data center synchronization scheme. In Section 6, we present the results of experiments and give our analysis. In Section 9, we summarize related work in the field and compare them with our work. Finally, we give our concluding remarks in Section 10.
Section snippets
System architecture of data center-hosted Web applications
Wide-area database replication technology and availability of data centers allows database copies to be distributed across the network. The goal of this approach is to reduce network latency and bandwidth usage by moving content closer to the users which in turn will offset the high cost of replica synchronization. A data center-hosted Web application requires a complete e-commerce Web site suite: Web server (WS), application server (AS), and database management system (DBMS) to be distributed
Adaptive data center synchronization
To address the drawbacks of the existing data center architecture and to support response time and precision sensitive applications, we propose an architecture that aims at building an efficient solution for synchronizing load and precision critical Web applications on top of an existing commercially available data center-hosted Web application system architecture.
In supporting content sensitive applications, each application is expected to specify the error bound threshold within which
Estimating precision of application results based on statistics
In order to correctly schedule our updates, we need to predict the difference between the actual query result (Qa) (i.e. the result of the query that is run on the master) and the current query result (Qc) that is evaluated using the tables stored on the data center. In order to achieve these goals, we investigate various approaches and evaluate them experimentally. We describe two major approaches.
Adaptive synchronization of data centers and load distribution
In Section 3, we presented the system architecture for data center-hosted applications with the deployment of the proposed adaptive data center synchronization to ensure response time and application precision guarantees. In this section, we describe the adaptive data center synchronization and load distribution techniques in more detail.
Experimental evaluations
We have conducted a comprehensive set of experiments to evaluate the proposed system architecture and the Adaptive Load and Precision Critical Scheduler (ALPCS) algorithm. In this section, we describe the experimental results. We start with the settings for the experiments.
Comparisons with other algorithms
The round robin algorithm is one of most frequently deployed scheme for data center synchronization. We observe that the round robin type of data center synchronization schemes are suitable when the variability in the loads, applications capacity, load limits, and other parameters at data centers are low. In particular, the round robin type of data center synchronization schemes do not work when the loads, applications capability, and load limits of data centers are diverge and change
Evaluation of scalability
In addition to the experiments in 6 Experimental evaluations, 7 Comparisons with other algorithms, we also conducted an evaluation of our algorithm on the aspect of scalability in scheduling a large number of data centers; especially when the user requests and transactions are close to the overall system capacity. In this section, we describe the experimental results to test on a large scale application delivery network (ADN) of 15 data centers and loads close to 90% of its overall capacity. We
Related work
The update scheduling problem in load and precision sensitive systems can be compared to the problem of scheduling tasks on a single machine in order to minimize the weighted completion time under precedence constrains [10], which has been proved to be NP-hard for the general case [21].
In [20], the problem of updating Web caches when their back-end databases receive updates is investigated. At the conceptual level, we can view the DCs as Web caches but there is no parallel for the error bounds
Conclusion
Wide-area database replication and CDN technologies make data center an ideal choice to host and run database-driven Web applications that demand response time and reliability. We point out that such an architecture lacks capability to support many content sensitive Web-based applications, such as applications for financial markets, corporate cash management, supply chain and inventory management for large distributed retail chain stores, and decision making systems. We propose a new system
Acknowledgements
The authors would like to acknowledge the contribution of Ullas Nambiar in the early version of this work.
Wen-Syan Li is a Senior Research Staff Member at NEC Laboratories America, Inc. He received his Ph.D. in Computer Science from Northwestern University in December 1995. He also holds an MBA degree. His main research interests include content delivery network, multimedia/hypermedia/document databases, WWW, e-commerce, and information retrieval. Wen-Syan is the recipient of the first NEC USA Achievement Award for his contributions in technology innovation.
References (30)
- et al.
On the network impact of dynamic server selection
Computer Networks
(1999) Sequencing jobs to minimize total weighted completion time
Annals of Discrete Mathematics
(1978)- A. Heddaya, S. Mirdad, D. Yates, Diffusion-based caching: WebWave, in: Proceedings of the 1997 NLANR Web Caching...
- M.R. Korupolu, M. Dahlin, Coordinated placement and replacement for large-scale distributed caches, in: Proceedings of...
- A. Heddaya, S. Mirdad, WebWave: globally load balanced fully distributed caching of hot published documents, in:...
- B. Adelberg, H. Garcia-Molina, B. Kao, Applying update streams in a soft real-time database system, in: Proceedings of...
- B. Adelberg, B. Kao, H. Garcia-Molina, Database support for efficiently maintaining derived data, in: Proceedings of...
- Akamai Technology. Information available at...
- M. Altinel, C. Bornhoevd, S. Krishnamurthy, C. Mohan, H. Pirahesh, B. Reinwald, Cache tables: paving the way for an...
- C. Bornhövd, M. Altinel, S. Krishnamurthy, C. Mohan, H. Pirahesh, B. Reinwald, Dbcache: middle-tier database caching...
Precedence constrained scheduling to minimize weighted completion time
Discrete Applied Mathematics
Transactional client-server cache consistency: alternatives and performance
ACM Transactions on Database Systems
Cited by (7)
Multi-state balance system reliability research considering load influence
2023, Reliability Engineering and System SafetyMulti-state system reliability modeling and optimization with considering dynamic load distribution mechanism
2016, Jixie Gongcheng Xuebao/Journal of Mechanical EngineeringReliability assessment for multi-state systems under a dynamic load-sharing strategy
2013, QR2MSE 2013 - Proceedings of 2013 International Conference on Quality, Reliability, Risk, Maintenance, and Safety EngineeringDynamic load balancing based on CPU utilization and data locality in distributed database using priority policy
2010, ICSTE 2010 - 2010 2nd International Conference on Software Technology and Engineering, ProceedingsCorrelation aware synchronization for near real time decision support systems
2010, Advances in Database Technology - EDBT 2010 - 13th International Conference on Extending Database Technology, ProceedingsNEEMON algorithm based on data locality for priority based dynamic load balancing in distributed database
2010, Communications in Computer and Information Science
Wen-Syan Li is a Senior Research Staff Member at NEC Laboratories America, Inc. He received his Ph.D. in Computer Science from Northwestern University in December 1995. He also holds an MBA degree. His main research interests include content delivery network, multimedia/hypermedia/document databases, WWW, e-commerce, and information retrieval. Wen-Syan is the recipient of the first NEC USA Achievement Award for his contributions in technology innovation.
Kemal Altintas is a Ph.D. student in Information and Computer Science Department, University of California, Irvine. He received his BS and MS degrees in Computer Science from Bilkent University, Turkey. His research interests include topics on data and information management systems, information retrieval and extraction from text and speech and natural language processing.
Murat Kantarcıolu is a Ph.D. candidate at Purdue University. He has a Master's degree in Computer Science from Purdue University and a Bachelor's degree in Computer Engineering from Middle East Technical University, Ankara Turkey. His research interests include data mining, database security and information security. He is a student member of ACM.
- 1
This work was performed when the author was with NEC Laboratories America, Inc. He is currently affiliated with IBM Almaden Research Center and can be reached at [email protected].