Automatic Physical Database Tuning Middleware for Web-Based Applications

Patvarczki, Jozsef; Heffernan, Neil T.

doi:10.1007/978-3-642-23737-9_26

Jozsef Patvarczki¹⁹ &
Neil T. Heffernan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6909))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

688 Accesses

Abstract

In this paper we conceptualize the database layout problem as a state space search problem. A state is a given assignment of tables to computer servers. We begin with a database and collect, for use as a workload input, a sequence of queries that were executed during normal usage of the database. The operators in the search are to fully replicate, horizontally partition, vertically partition, and de-normalize a table. We do a time intensive search over different table layouts, and at each iteration, physically create the configurations, and evaluate the total throughput of the system. We report our empirical results of two forms. First, we empirically validate as facts the heuristics that Database Administrators (DBAs) currently use as in doing this task manually: for tables that have a high ratio of update, delete, and insert to retrieval queries one should horizontally partition, but for a small ratio one should fully replicate a table. Such rules of thumb are reasonable, however we want to parameterize some common guidelines that DBAs can use. Our second empirical result is that we applied this search to our existing data test case and found a reliable increase in total system throughput. The search over layouts is very expensive, but we argue that our method is practical and useful, as entities trying to scale up their Web-based applications would be perfectly happy to spend a few weeks of CPU time to increase their system throughput (and potentially reduce the investment in hardware). To make this search more practical, we want to learn reasonable rules to guide the search to eliminate many layout configurations that are not very likely to succeed. The second aspect of our project (not reported here) is to use the created configurations as input into a machine learning system, to create general rules about when to use the different layout operators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow. 2, 922–933 (2009)
Article Google Scholar
Agrawal, S., Chaudhuri, S., Kollar, L., Marathe, A., Narasayya, V., Syamala, M.: Database tuning advisor for microsoft sql server 2005. In: Proceedings of VLDB, pp. 1110–1121 (2004)
Google Scholar
Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in sql databases. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 496–505. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Google Scholar
Amiri, K., Park, S., Tewari, R., Padmanabhan, S.: DBProxy: A Dynamic Data Cache for Web Applications. In: IEEE Int’l Conference on Data Engineering (ICDE), Bangalore, India (March 2003)
Google Scholar
Dataupia. Dataupia satori server (2008), http://www.dataupia.com/pdfs/productoverview/Dataupia%20Product%20Overview.pdf
Davidson, G.S., Boyack, K.W., Zacharski, R.A., Helmreich, S.C., Cowie, J.R.: Data-centric computing with the netezza architecture. In: Sandia Report, Unlimited Release (April 2006)
Google Scholar
Dean, J., Ghemawat, S., Inc, G.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004: Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation. USENIX Association (2004)
Google Scholar
Greenplum. Greenplum database 3.2 administrator guide (2008), http://docs.huihoo.com/greenplum/GPDB-3.2-AdminGuide.pdf
Groothuyse, T., Sivasubramanian, S., Pierre, G.: GlobeTP: Template-Based Database Replication for Scalable Web Applications. In: Int’l World Wide Web Conf (WWW), Alberta, Canada (May 2007)
Google Scholar
P. G. D. Group. Postgresql (2011), http://www.postgresql.org
Heffernan, N.T., Turner, T.E., Lourenco, A.L.N., Macasek, M.A., Nuzzo-Jones, G., Koedinger, K.R.: The ASSISTment Builder: Towards an Analysis of Cost Effectiveness of ITS creation. In: FLAIRS, Florida, USA (2006)
Google Scholar
MacFarland, A.: The speed of paraccels data warehousing solution changes the economics of business insight (2010), http://www.clipper.com/research/TCG2010008.pdf
Olston, C., Manjhi, A., Garrod, C., Ailamaki, A., Maggs, B.M., Mowry, T.C.: A scalability service for dynamic web applications. In: Proc. CIDR, pp. 56–69 (2005)
Google Scholar
Papadomanolakis, E., Ailamaki, A.: Autopart: Automating schema design for large scientific databases using data partitioning. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 383–392. IEEE Computer Society, Los Alamitos (2004)
Google Scholar
A. O. W. Paper. A technical overview of the sun oracle database machine and exadata storage server (2010), http://www.oracle.com/technetwork/database/exadata/exadata-technical-whitepaper-134575.pdf
Patvarczki, J., Mani, M., Heffernan, N.: Performance driven database design for scalable web applications. In: Grundspenkis, J., Morzy, T., Vossen, G. (eds.) ADBIS 2009. LNCS, vol. 5739, pp. 43–58. Springer, Heidelberg (2009)
Chapter Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: SIGMOD 2009: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Google Scholar
Plattner, C., Alonso, G.: Ganymed: Scalable replication for transactional web applications. In: Jacobsen, H.-A. (ed.) Middleware 2004. LNCS, vol. 3231, pp. 155–174. Springer, Heidelberg (2004)
Chapter Google Scholar
Plattner, C., Alonso, G., Özsu, M.T.: DbFarm: A scalable cluster for multiple databases. In: van Steen, M., Henning, M. (eds.) Middleware 2006. LNCS, vol. 4290, pp. 180–200. Springer, Heidelberg (2006)
Chapter Google Scholar
Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. In: International Conference on Very Large Databases, pp. 430–441. Morgan Kaufmann Publishers, San Francisco (2002)
Chapter Google Scholar
Rao, J., Zhang, C., Lohman, G., Megiddo, N., Rao, J., Zhang, C., Megiddo, N.: Automating physical database design in a parallel database. In: Proc. 2002 ACM SIGMOD, pp. 558–569 (2002)
Google Scholar
Shafer, J., Rixner, S., Cox, A.L.: The hadoop distributed filesystem: Balancing portability and performance (April 2010)
Google Scholar
Sivasubramanian, S., Pierre, G., van Steen, M.: GlobeDB: Autonomic Data Replication for Web Applications. In: Int’l World Wide Web Conf (WWW), Chiba, Japan (May 2005)
Google Scholar
Ślezak, D., Eastwood, V.: Data warehouse technology by infobright. In: Proceedings of the 35th SIGMOD International Conference on Management of Data SIGMOD 2009, pp. 841–846. ACM, New York (2009)
Google Scholar
Sue Clarke, T.: Butler group research paper (October 2000), http://www.teradata.com/library/pdf/butler_100101.pdf
Vermeij, M., Quak, W., Kersten, M., Nes, N.: Monetdb, a novel spatial column-store dbms. In: Netterberg, I., Coetzee, S. (eds.) Academic Proceedings of the 2008 Free and Open Source for Geospatial (FOSS4G) Conference, OSGeo, pp. 193–199 (2008)
Google Scholar
Vertica. The vertica analytic database–introducing a new era in dbms performance and efficiency (2009), http://www.redhat.com/solutions/intelligence/collateral/vertica_new_era_in_dbms_performance.pdf
Winter, R.: Hp neoview architecture and performance (2009), http://h20195.www2.hp.com/v2/GetPDF.aspx/4AA2-6924ENW.pdf
Zilio, D.C., Jhingran, A., Padmanabhan, S.: Partitioning key selection for a shared-nothing parallel database system. IBM Research Report RC (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Worcester Polytechnic Institute, 100 Institute Road, Worcester, Massachusetts, USA
Jozsef Patvarczki & Neil T. Heffernan

Authors

Jozsef Patvarczki
View author publications
You can also search for this author in PubMed Google Scholar
Neil T. Heffernan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Alpen Adria Universität Klagenfurt, Institut für Informatik-Systeme, Universitätsstr. 65, 9020, Klagenfurt, Austria
Johann Eder
Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava, Ilkoviçova 3, 842 16, Bratislava, Slovakia
Maria Bielikova
Institut für Softwaretechnik, Technische Universität Wien, Favoritenstr. 9-11/188, 1040, Wien, Austria
A Min Tjoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patvarczki, J., Heffernan, N.T. (2011). Automatic Physical Database Tuning Middleware for Web-Based Applications. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds) Advances in Databases and Information Systems. ADBIS 2011. Lecture Notes in Computer Science, vol 6909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23737-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-23737-9_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23736-2
Online ISBN: 978-3-642-23737-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics