Abstract
This paper is concerned with data provisioning services (information search, retrieval, storage, etc.) dealing with a large and heterogeneous information repository. Increasingly, this class of services is being hosted and delivered through Cloud infrastructures. Although such systems are becoming popular, existing resource management methods (e.g. load-balancing techniques) do not consider workload patterns nor do they perform well when subjected to non-uniformly distributed datasets. If these problems can be solved, this class of services can be made to operate in more a scalable, efficient, and reliable manner.
The main contribution of this paper is a approach that combines proprietary cloud-based load balancing techniques and density-based partitioning for efficient range query processing across relational database-as-a-service in cloud computing environments. The study is conducted over a real-world data provisioning service that manages a large historical news database from Thomson Reuters. The proposed approach has been implemented and tested as a multi-tier web application suite consisting of load-balancing, application, and database layers. We have validated our approach by conducting a set of rigorous performance evaluation experiments using the Amazon EC2 infrastructure. The results prove that augmenting a cloud-based load-balancing service (e.g. Amazon Elastic Load Balancer) with workload characterization intelligence (density and distribution of data; composition of queries) offers significant benefits with regards to the overall system’s performance (i.e. query latency and database service throughput).
Similar content being viewed by others
References
Armbrust M et al (2009) Above the clouds: A Berkeley view of cloud computing. Tech Rep UCB/EECS-2009-28, EECS Department. University of California, Berkeley
Rochwerger B et al (2009) The RESERVOIR model and architecture for open federated cloud computing. IBM J Res Dev 53(4):535–545
Buyya R et al (2009) Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst 25(6):599–616
Gillett FE et al (2008) Future view: The new tech ecosystems of cloud, cloud services, and cloud computing, Tech rep, Forrester Research, Inc
Varia J (2009) Cloud architectures, Tech rep, Amazon Web Services
Windows azure platform. http://www.microsoft.com/azure/ (accessed August 2011)
Wang L et al (2010) Provide virtual machine information for grid computing. IEEE Trans Syst Man Cybern, Part A, Syst Hum 40(6):1362–1374
Amazon cloudwatch service. http://aws.amazon.com/cloudwatch/ (accessed August 2011)
Amazon load balancer service. http://aws.amazon.com/elasticloadbalancing/ (accessed August 2011)
Amazon elastic mapreduce service. http://aws.amazon.com/elasticmapreduce/ (accessed August 2011)
Force.com cloud solutions (saas). http://www.salesforce.com/platform/ (accessed August 2011)
Wang L et al (2010) Cloud computing: a perspective study. New Gener Comput 28(2):137–146
Pitoura T et al (2006) Replication, load balancing and efficient range query processing in dhts. In: Advances in database technology - EDBT 2006, vol 3896. Springer, Berlin, pp 131–148
Chen D et al (2010) Synchronization in federation community networks. J Parallel Distrib Comput 70(2):144–159
Olofson CW (August 2010) Keeping your data in the clouds and your feet on the ground, whitepaper, idc, sponsored by: Sybase
Curino C et al (2011) Relational cloud: A database service for the cloud. In: 5th biennial conference on innovative data Systems research. Asilomar, CA
S A et al (2008) Automatic virtual machine configuration for database workloads. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada. pp 953–966
Sakr S et al (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor PP(99):1–26
Brantner M et al (2008) Building a database on s3. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, Vancouver, pp 251–264
SIRCA, Thomson Reuters news database. http://www.sirca.org.au/ (accessed august 2011)
S J et al (2006) Adaptive self-tuning memory in db2. In: Proceedings of the 2006 (32nd) international conference on very large data bases, VLDB Endowment. pp 1081–1092
Narayanan D et al (2005) Continuous resource monitoring for self-predicting dbms. In: Proceedings of the 2005 IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication Systems. IEEE Press, New York
C CY et al (1993) Optimal mmi file systems for orthogonal range retrieval. Inf Syst 18(1):37–54
Harris P et al (1993) Optimal dynamic multi-attribute hashing for range queries. BIT Numer Math 33(4):561–579
Lee J et al (1997) A region splitting strategy for physical database design of multidimensional file organizations. In: Proceedings of the 1997 (23rd) international conference on very large data bases. Kaufmann, San Francisco, pp 416–425
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guabtni, A., Ranjan, R. & Rabhi, F.A. A workload-driven approach to database query processing in the cloud. J Supercomput 63, 722–736 (2013). https://doi.org/10.1007/s11227-011-0717-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0717-y