Skip to main content

Automatic Physical Database Tuning Middleware for Web-Based Applications

  • Conference paper
Advances in Databases and Information Systems (ADBIS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6909))

  • 688 Accesses

Abstract

In this paper we conceptualize the database layout problem as a state space search problem. A state is a given assignment of tables to computer servers. We begin with a database and collect, for use as a workload input, a sequence of queries that were executed during normal usage of the database. The operators in the search are to fully replicate, horizontally partition, vertically partition, and de-normalize a table. We do a time intensive search over different table layouts, and at each iteration, physically create the configurations, and evaluate the total throughput of the system. We report our empirical results of two forms. First, we empirically validate as facts the heuristics that Database Administrators (DBAs) currently use as in doing this task manually: for tables that have a high ratio of update, delete, and insert to retrieval queries one should horizontally partition, but for a small ratio one should fully replicate a table. Such rules of thumb are reasonable, however we want to parameterize some common guidelines that DBAs can use. Our second empirical result is that we applied this search to our existing data test case and found a reliable increase in total system throughput. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend a few weeks of CPU time to increase their system throughput (and potentially reduce the investment in hardware). To make this search more practical, we want to learn reasonable rules to guide the search to eliminate many layout configurations that are not very likely to succeed. The second aspect of our project (not reported here) is to use the created configurations as input into a machine learning system, to create general rules about when to use the different layout operators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow. 2, 922–933 (2009)

    Article  Google Scholar 

  2. Agrawal, S., Chaudhuri, S., Kollar, L., Marathe, A., Narasayya, V., Syamala, M.: Database tuning advisor for microsoft sql server 2005. In: Proceedings of VLDB, pp. 1110–1121 (2004)

    Google Scholar 

  3. Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in sql databases. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 496–505. Morgan Kaufmann Publishers Inc., San Francisco (2000)

    Google Scholar 

  4. Amiri, K., Park, S., Tewari, R., Padmanabhan, S.: DBProxy: A Dynamic Data Cache for Web Applications. In: IEEE Int’l Conference on Data Engineering (ICDE), Bangalore, India (March 2003)

    Google Scholar 

  5. Dataupia. Dataupia satori server (2008), http://www.dataupia.com/pdfs/productoverview/Dataupia%20Product%20Overview.pdf

  6. Davidson, G.S., Boyack, K.W., Zacharski, R.A., Helmreich, S.C., Cowie, J.R.: Data-centric computing with the netezza architecture. In: Sandia Report, Unlimited Release (April 2006)

    Google Scholar 

  7. Dean, J., Ghemawat, S., Inc, G.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation. USENIX Association (2004)

    Google Scholar 

  8. Greenplum. Greenplum database 3.2 administrator guide (2008), http://docs.huihoo.com/greenplum/GPDB-3.2-AdminGuide.pdf

  9. Groothuyse, T., Sivasubramanian, S., Pierre, G.: GlobeTP: Template-Based Database Replication for Scalable Web Applications. In: Int’l World Wide Web Conf (WWW), Alberta, Canada (May 2007)

    Google Scholar 

  10. P. G. D. Group. Postgresql (2011), http://www.postgresql.org

  11. Heffernan, N.T., Turner, T.E., Lourenco, A.L.N., Macasek, M.A., Nuzzo-Jones, G., Koedinger, K.R.: The ASSISTment Builder: Towards an Analysis of Cost Effectiveness of ITS creation. In: FLAIRS, Florida, USA (2006)

    Google Scholar 

  12. MacFarland, A.: The speed of paraccels data warehousing solution changes the economics of business insight (2010), http://www.clipper.com/research/TCG2010008.pdf

  13. Olston, C., Manjhi, A., Garrod, C., Ailamaki, A., Maggs, B.M., Mowry, T.C.: A scalability service for dynamic web applications. In: Proc. CIDR, pp. 56–69 (2005)

    Google Scholar 

  14. Papadomanolakis, E., Ailamaki, A.: Autopart: Automating schema design for large scientific databases using data partitioning. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 383–392. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  15. A. O. W. Paper. A technical overview of the sun oracle database machine and exadata storage server (2010), http://www.oracle.com/technetwork/database/exadata/exadata-technical-whitepaper-134575.pdf

  16. Patvarczki, J., Mani, M., Heffernan, N.: Performance driven database design for scalable web applications. In: Grundspenkis, J., Morzy, T., Vossen, G. (eds.) ADBIS 2009. LNCS, vol. 5739, pp. 43–58. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  17. Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)

    Google Scholar 

  18. Plattner, C., Alonso, G.: Ganymed: Scalable replication for transactional web applications. In: Jacobsen, H.-A. (ed.) Middleware 2004. LNCS, vol. 3231, pp. 155–174. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Plattner, C., Alonso, G., Özsu, M.T.: DbFarm: A scalable cluster for multiple databases. In: van Steen, M., Henning, M. (eds.) Middleware 2006. LNCS, vol. 4290, pp. 180–200. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  20. Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. In: International Conference on Very Large Databases, pp. 430–441. Morgan Kaufmann Publishers, San Francisco (2002)

    Chapter  Google Scholar 

  21. Rao, J., Zhang, C., Lohman, G., Megiddo, N., Rao, J., Zhang, C., Megiddo, N.: Automating physical database design in a parallel database. In: Proc. 2002 ACM SIGMOD, pp. 558–569 (2002)

    Google Scholar 

  22. Shafer, J., Rixner, S., Cox, A.L.: The hadoop distributed filesystem: Balancing portability and performance (April 2010)

    Google Scholar 

  23. Sivasubramanian, S., Pierre, G., van Steen, M.: GlobeDB: Autonomic Data Replication for Web Applications. In: Int’l World Wide Web Conf (WWW), Chiba, Japan (May 2005)

    Google Scholar 

  24. Ślezak, D., Eastwood, V.: Data warehouse technology by infobright. In: Proceedings of the 35th SIGMOD International Conference on Management of Data SIGMOD 2009, pp. 841–846. ACM, New York (2009)

    Google Scholar 

  25. Sue Clarke, T.: Butler group research paper (October 2000), http://www.teradata.com/library/pdf/butler_100101.pdf

  26. Vermeij, M., Quak, W., Kersten, M., Nes, N.: Monetdb, a novel spatial column-store dbms. In: Netterberg, I., Coetzee, S. (eds.) Academic Proceedings of the 2008 Free and Open Source for Geospatial (FOSS4G) Conference, OSGeo, pp. 193–199 (2008)

    Google Scholar 

  27. Vertica. The vertica analytic database–introducing a new era in dbms performance and efficiency (2009), http://www.redhat.com/solutions/intelligence/collateral/vertica_new_era_in_dbms_performance.pdf

  28. Winter, R.: Hp neoview architecture and performance (2009), http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA2-6924ENW.pdf

  29. Zilio, D.C., Jhingran, A., Padmanabhan, S.: Partitioning key selection for a shared-nothing parallel database system. IBM Research Report RC (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Patvarczki, J., Heffernan, N.T. (2011). Automatic Physical Database Tuning Middleware for Web-Based Applications. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds) Advances in Databases and Information Systems. ADBIS 2011. Lecture Notes in Computer Science, vol 6909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23737-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23737-9_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23736-2

  • Online ISBN: 978-3-642-23737-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics