Skip to main content
Log in

A workload-driven approach to database query processing in the cloud

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper is concerned with data provisioning services (information search, retrieval, storage, etc.) dealing with a large and heterogeneous information repository. Increasingly, this class of services is being hosted and delivered through Cloud infrastructures. Although such systems are becoming popular, existing resource management methods (e.g. load-balancing techniques) do not consider workload patterns nor do they perform well when subjected to non-uniformly distributed datasets. If these problems can be solved, this class of services can be made to operate in more a scalable, efficient, and reliable manner.

The main contribution of this paper is a approach that combines proprietary cloud-based load balancing techniques and density-based partitioning for efficient range query processing across relational database-as-a-service in cloud computing environments. The study is conducted over a real-world data provisioning service that manages a large historical news database from Thomson Reuters. The proposed approach has been implemented and tested as a multi-tier web application suite consisting of load-balancing, application, and database layers. We have validated our approach by conducting a set of rigorous performance evaluation experiments using the Amazon EC2 infrastructure. The results prove that augmenting a cloud-based load-balancing service (e.g. Amazon Elastic Load Balancer) with workload characterization intelligence (density and distribution of data; composition of queries) offers significant benefits with regards to the overall system’s performance (i.e. query latency and database service throughput).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Armbrust M et al (2009) Above the clouds: A Berkeley view of cloud computing. Tech Rep UCB/EECS-2009-28, EECS Department. University of California, Berkeley

  2. Rochwerger B et al (2009) The RESERVOIR model and architecture for open federated cloud computing. IBM J Res Dev 53(4):535–545

    Article  Google Scholar 

  3. Buyya R et al (2009) Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst 25(6):599–616

    Article  Google Scholar 

  4. Gillett FE et al (2008) Future view: The new tech ecosystems of cloud, cloud services, and cloud computing, Tech rep, Forrester Research, Inc

  5. Varia J (2009) Cloud architectures, Tech rep, Amazon Web Services

  6. Windows azure platform. http://www.microsoft.com/azure/ (accessed August 2011)

  7. Wang L et al (2010) Provide virtual machine information for grid computing. IEEE Trans Syst Man Cybern, Part A, Syst Hum 40(6):1362–1374

    Article  Google Scholar 

  8. Amazon cloudwatch service. http://aws.amazon.com/cloudwatch/ (accessed August 2011)

  9. Amazon load balancer service. http://aws.amazon.com/elasticloadbalancing/ (accessed August 2011)

  10. Amazon elastic mapreduce service. http://aws.amazon.com/elasticmapreduce/ (accessed August 2011)

  11. Force.com cloud solutions (saas). http://www.salesforce.com/platform/ (accessed August 2011)

  12. Wang L et al (2010) Cloud computing: a perspective study. New Gener Comput 28(2):137–146

    Article  MATH  Google Scholar 

  13. Pitoura T et al (2006) Replication, load balancing and efficient range query processing in dhts. In: Advances in database technology - EDBT 2006, vol 3896. Springer, Berlin, pp 131–148

    Chapter  Google Scholar 

  14. Chen D et al (2010) Synchronization in federation community networks. J Parallel Distrib Comput 70(2):144–159

    Article  MATH  Google Scholar 

  15. Olofson CW (August 2010) Keeping your data in the clouds and your feet on the ground, whitepaper, idc, sponsored by: Sybase

  16. Curino C et al (2011) Relational cloud: A database service for the cloud. In: 5th biennial conference on innovative data Systems research. Asilomar, CA

    Google Scholar 

  17. S A et al (2008) Automatic virtual machine configuration for database workloads. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada. pp 953–966

    Google Scholar 

  18. Sakr S et al (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor PP(99):1–26

    Google Scholar 

  19. Brantner M et al (2008) Building a database on s3. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, Vancouver, pp 251–264

    Chapter  Google Scholar 

  20. SIRCA, Thomson Reuters news database. http://www.sirca.org.au/ (accessed august 2011)

  21. S J et al (2006) Adaptive self-tuning memory in db2. In: Proceedings of the 2006 (32nd) international conference on very large data bases, VLDB Endowment. pp 1081–1092

    Google Scholar 

  22. Narayanan D et al (2005) Continuous resource monitoring for self-predicting dbms. In: Proceedings of the 2005 IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication Systems. IEEE Press, New York

    Google Scholar 

  23. C CY et al (1993) Optimal mmi file systems for orthogonal range retrieval. Inf Syst 18(1):37–54

    Article  MATH  Google Scholar 

  24. Harris P et al (1993) Optimal dynamic multi-attribute hashing for range queries. BIT Numer Math 33(4):561–579

    Article  MATH  Google Scholar 

  25. Lee J et al (1997) A region splitting strategy for physical database design of multidimensional file organizations. In: Proceedings of the 1997 (23rd) international conference on very large data bases. Kaufmann, San Francisco, pp 416–425

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajiv Ranjan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guabtni, A., Ranjan, R. & Rabhi, F.A. A workload-driven approach to database query processing in the cloud. J Supercomput 63, 722–736 (2013). https://doi.org/10.1007/s11227-011-0717-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0717-y

Keywords

Navigation