Abstract
In this paper we conceptualize the database layout problem as a state space search problem. A state is a given assignment of tables to computer servers. We begin with a database and collect, for use as a workload input, a sequence of queries that were executed during normal usage of the database. The operators in the search are to fully replicate, horizontally partition, vertically partition, and de-normalize a table. We do a time intensive search over different table layouts, and at each iteration, physically create the configurations, and evaluate the total throughput of the system. We report our empirical results of two forms. First, we empirically validate as facts the heuristics that Database Administrators (DBAs) currently use as in doing this task manually: for tables that have a high ratio of update, delete, and insert to retrieval queries one should horizontally partition, but for a small ratio one should fully replicate a table. Such rules of thumb are reasonable, however we want to parameterize some common guidelines that DBAs can use. Our second empirical result is that we applied this search to our existing data test case and found a reliable increase in total system throughput. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend a few weeks of CPU time to increase their system throughput (and potentially reduce the investment in hardware). To make this search more practical, we want to learn reasonable rules to guide the search to eliminate many layout configurations that are not very likely to succeed. The second aspect of our project (not reported here) is to use the created configurations as input into a machine learning system, to create general rules about when to use the different layout operators.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow. 2, 922–933 (2009)
Agrawal, S., Chaudhuri, S., Kollar, L., Marathe, A., Narasayya, V., Syamala, M.: Database tuning advisor for microsoft sql server 2005. In: Proceedings of VLDB, pp. 1110–1121 (2004)
Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in sql databases. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 496–505. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Amiri, K., Park, S., Tewari, R., Padmanabhan, S.: DBProxy: A Dynamic Data Cache for Web Applications. In: IEEE Int’l Conference on Data Engineering (ICDE), Bangalore, India (March 2003)
Dataupia. Dataupia satori server (2008), http://www.dataupia.com/pdfs/productoverview/Dataupia%20Product%20Overview.pdf
Davidson, G.S., Boyack, K.W., Zacharski, R.A., Helmreich, S.C., Cowie, J.R.: Data-centric computing with the netezza architecture. In: Sandia Report, Unlimited Release (April 2006)
Dean, J., Ghemawat, S., Inc, G.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation. USENIX Association (2004)
Greenplum. Greenplum database 3.2 administrator guide (2008), http://docs.huihoo.com/greenplum/GPDB-3.2-AdminGuide.pdf
Groothuyse, T., Sivasubramanian, S., Pierre, G.: GlobeTP: Template-Based Database Replication for Scalable Web Applications. In: Int’l World Wide Web Conf (WWW), Alberta, Canada (May 2007)
P. G. D. Group. Postgresql (2011), http://www.postgresql.org
Heffernan, N.T., Turner, T.E., Lourenco, A.L.N., Macasek, M.A., Nuzzo-Jones, G., Koedinger, K.R.: The ASSISTment Builder: Towards an Analysis of Cost Effectiveness of ITS creation. In: FLAIRS, Florida, USA (2006)
MacFarland, A.: The speed of paraccels data warehousing solution changes the economics of business insight (2010), http://www.clipper.com/research/TCG2010008.pdf
Olston, C., Manjhi, A., Garrod, C., Ailamaki, A., Maggs, B.M., Mowry, T.C.: A scalability service for dynamic web applications. In: Proc. CIDR, pp. 56–69 (2005)
Papadomanolakis, E., Ailamaki, A.: Autopart: Automating schema design for large scientific databases using data partitioning. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 383–392. IEEE Computer Society, Los Alamitos (2004)
A. O. W. Paper. A technical overview of the sun oracle database machine and exadata storage server (2010), http://www.oracle.com/technetwork/database/exadata/exadata-technical-whitepaper-134575.pdf
Patvarczki, J., Mani, M., Heffernan, N.: Performance driven database design for scalable web applications. In: Grundspenkis, J., Morzy, T., Vossen, G. (eds.) ADBIS 2009. LNCS, vol. 5739, pp. 43–58. Springer, Heidelberg (2009)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Plattner, C., Alonso, G.: Ganymed: Scalable replication for transactional web applications. In: Jacobsen, H.-A. (ed.) Middleware 2004. LNCS, vol. 3231, pp. 155–174. Springer, Heidelberg (2004)
Plattner, C., Alonso, G., Özsu, M.T.: DbFarm: A scalable cluster for multiple databases. In: van Steen, M., Henning, M. (eds.) Middleware 2006. LNCS, vol. 4290, pp. 180–200. Springer, Heidelberg (2006)
Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. In: International Conference on Very Large Databases, pp. 430–441. Morgan Kaufmann Publishers, San Francisco (2002)
Rao, J., Zhang, C., Lohman, G., Megiddo, N., Rao, J., Zhang, C., Megiddo, N.: Automating physical database design in a parallel database. In: Proc. 2002 ACM SIGMOD, pp. 558–569 (2002)
Shafer, J., Rixner, S., Cox, A.L.: The hadoop distributed filesystem: Balancing portability and performance (April 2010)
Sivasubramanian, S., Pierre, G., van Steen, M.: GlobeDB: Autonomic Data Replication for Web Applications. In: Int’l World Wide Web Conf (WWW), Chiba, Japan (May 2005)
Ślezak, D., Eastwood, V.: Data warehouse technology by infobright. In: Proceedings of the 35th SIGMOD International Conference on Management of Data SIGMOD 2009, pp. 841–846. ACM, New York (2009)
Sue Clarke, T.: Butler group research paper (October 2000), http://www.teradata.com/library/pdf/butler_100101.pdf
Vermeij, M., Quak, W., Kersten, M., Nes, N.: Monetdb, a novel spatial column-store dbms. In: Netterberg, I., Coetzee, S. (eds.) Academic Proceedings of the 2008 Free and Open Source for Geospatial (FOSS4G) Conference, OSGeo, pp. 193–199 (2008)
Vertica. The vertica analytic database–introducing a new era in dbms performance and efficiency (2009), http://www.redhat.com/solutions/intelligence/collateral/vertica_new_era_in_dbms_performance.pdf
Winter, R.: Hp neoview architecture and performance (2009), http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA2-6924ENW.pdf
Zilio, D.C., Jhingran, A., Padmanabhan, S.: Partitioning key selection for a shared-nothing parallel database system. IBM Research Report RC (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Patvarczki, J., Heffernan, N.T. (2011). Automatic Physical Database Tuning Middleware for Web-Based Applications. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds) Advances in Databases and Information Systems. ADBIS 2011. Lecture Notes in Computer Science, vol 6909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23737-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-23737-9_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23736-2
Online ISBN: 978-3-642-23737-9
eBook Packages: Computer ScienceComputer Science (R0)