Skip to main content

Workload-Independent Data-Driven Vertical Partitioning

  • Conference paper
  • First Online:
New Trends in Databases and Information Systems (ADBIS 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 767))

Included in the following conference series:

Abstract

Vertical partitioning is a well-explored area of automatic physical database design. The classic approach is as follows: derive an optimal vertical partitioning scheme for a given database and a workload. The workload describes queries, their frequencies, and involved attributes.

In this paper we identify a novel class of vertical partitioning algorithms. The algorithms of this class do not rely on knowledge of the workload, but instead use data properties that are contained in the workload itself. We propose such algorithm that uses a logical scheme represented by functional dependencies, which are derived from stored data. In order to discover functional dependencies we use TANE — a popular functional dependency extraction algorithm. We evaluate our algorithm using an industrial DBMS (PostgreSQL) on number of workloads. We compare the performance of an unpartitioned configuration with partitions produced by our algorithm and several state-of-the-art workload-aware algorithms.

This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    TANE implementation. http://www.cs.helsinki.fi/research/fdk/datamining/tane/.

  2. 2.

    Iowa Liquor, https://data.iowa.gov/Economy/Iowa-Liquor-Sales/m3tr-qhgy.

  3. 3.

    https://www.reddit.com/r/bigquery/comments/37fcm6/iowa_liquor_sales_dataset_879mb_3million_rows/?st=j3ppu30u&sh=35bdeeb2.

  4. 4.

    http://www.math.spbu.ru/user/chernishev/papers/iowa_queries.txt.

References

  1. Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, pp. 359–370. ACM, 2004

    Google Scholar 

  2. Apers, P.M.G.: Data allocation in distributed database systems. ACM Trans. Database Syst. (TODS) 13(3), 263–304 (1988)

    Article  Google Scholar 

  3. Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03730-6_9

    Chapter  Google Scholar 

  4. Bobrov, N., Chernishev, G., Grigoriev, D., Novikov, B.: An evaluation of TANE algorithm for functional dependency detection. In: Ouhammou, Y., et al. (eds.) MEDI 2017. LNCS, vol. 10563, pp. 208–222. Springer International Publishing, Cham (2017). doi:10.1007/978-3-319-66854-3_16

    Google Scholar 

  5. Boehm, A.M., Seipel, D., Sickmann, A., Wetzka, M.: Squash: a tool for analyzing, tuning and refactoring relational database applications. In: Seipel, D., Hanus, M., Wolf, A. (eds.) INAP/WLP -2007. LNCS (LNAI), vol. 5437, pp. 82–98. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00675-3_6

    Chapter  Google Scholar 

  6. Cheng, C.-H.: A branch and bound clustering algorithm. IEEE Trans. Syst. Man Cybern. 25, 895–898 (1995)

    Article  Google Scholar 

  7. Chernishev, G.: A survey of dbms physical design approaches. SPIIRAS Proceedings 24, 222–276 (2013)

    Google Scholar 

  8. Chernishev, G.: The design of an adaptive column-store system. J. Big Data 4(5), 25 (2017)

    Google Scholar 

  9. Cornell, D., Yu, P.: An effective approach to vertical partitioning for physical design of relational databases. IEEE Trans. SE 16, 248–258 (1990)

    Article  Google Scholar 

  10. De Marchi, F., Lopes, S., Petit, J.-M., Toumani, F.: Analysis of existing databases at the logical level: the DBA companion project. SIGMOD Rec. 32, 47–52 (2003)

    Article  Google Scholar 

  11. Fung, C.-W., Karlapalem, K., Li, Q.: Cost-driven vertical class partitioning for methods in object oriented databases. VLDB J. 12, 187–210 (2003)

    Article  Google Scholar 

  12. Galaktionov, V., Chernishev, G., Novikov, B., Grigoriev, D.: Matrix clustering algorithms for vertical partitioning problem: an initial performance study. In: DAMDID/RCDL 2016, Russia, pp. 24–31 (2016)

    Google Scholar 

  13. Galaktionov, V., Chernishev, G., Smirnov, K., Novikov, B., Grigoriev, D.A.: A study of several matrix-clustering vertical partitioning algorithms in a disk-based environment. In: Kalinichenko, L., Kuznetsov, S.O., Manolopoulos, Y. (eds.) DAMDID/RCDL 2016. CCIS, vol. 706, pp. 163–177. Springer, Cham (2017). doi:10.1007/978-3-319-57135-5_12

    Chapter  Google Scholar 

  14. Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudre-Mauroux, P., Madden, S.: HYRISE: a main memory hybrid storage engine. Proc. VLDB Endow. 4, 105–116 (2010)

    Article  Google Scholar 

  15. Hammer, M., Niamir, B.: A heuristic approach to attribute partitioning. In: SIGMOD 1979, pp. 93–101 (1979)

    Google Scholar 

  16. Hankins, R.A., Patel, J.M.: Data morphing: an adaptive, cache-conscious storage technique. In: VLDB 2003, pp. 417–428 (2003)

    Google Scholar 

  17. Hoffer, J.A., Severance, D.G.: The use of cluster analysis in physical data base design. In: VLDB 1975, pp. 69–86 (1975)

    Google Scholar 

  18. Jindal, A., Palatinus, E., Pavlov, V., Dittrich, J.: A comparison of knives for bread slicing. Proc. VLDB Endow. 6, 361–372 (2013)

    Article  Google Scholar 

  19. Li, L., Gruenwald, L.: SMOPD: a vertical database partitioning system with a fully automatic online approach. In: IDEAS 2013, pp. 168–173 (2013)

    Google Scholar 

  20. Lin, X., Orlowska, M., Zhang, Y.: A graph based cluster approach for vertical partitioning in database design. Data Knowl. Eng. 11, 151–169 (1993)

    Article  MATH  Google Scholar 

  21. Ma, H., Schewe, K.-D. Kirchberg, M.: A heuristic approach to fragmentation incorporating query information. In: Databases and Information Systems IV - Selected Papers from the Seventh International Baltic Conference, DB&IS 2006, Vilnius, Lithuania, 3–6 July 2006. Frontiers in Artificial Intelligence and Applications, vol. 155. IOS Press (2006). ISBN 978-1-58603-715-4

    Google Scholar 

  22. Malik, T., Wang, X., Burns, R., Dash, D., Ailamaki, A.: Automated physical design in database caches. In: ICDEW 2008, pp. 27–34 (2008)

    Google Scholar 

  23. Navathe, S., Ceri, S., Wiederhold, G., Dou, J.: Vertical partitioning algorithms for database design. ACM Trans. Database Syst. 9, 680–710 (1984)

    Article  Google Scholar 

  24. Navathe, S., Karlapalem, K., Ra, M.: A mixed fragmentation methodology for initial distributed database design. J. Comput. Softw. Eng. 3(4) (1995)

    Google Scholar 

  25. Pai-Cheng, C.: A transaction-oriented approach to attribute partitioning. Inf. Syst. 17, 329–342 (1992)

    Article  Google Scholar 

  26. Papadomanolakis, S., Ailamaki, A.: AutoPart: automating schema design for large scientific databases using data partitioning. In: SSDBM 2004, pp. 383–392 (2004)

    Google Scholar 

  27. Qian, L., LeFevre, K., Jagadish, H.V.: CRIUS: user-friendly database design. Proc. VLDB Endow. 4, 81–92 (2010)

    Article  Google Scholar 

  28. Rodríguez, L., Li, X.: A dynamic vertical partitioning approach for distributed database system. In: SMC 2011, pp. 1853–1858 (2011)

    Google Scholar 

  29. Sacca, D., Wiederhold, G.: Database partitioning in a cluster of processors. ACM Trans. Database Syst. 10, 29–56 (1985)

    Article  MATH  Google Scholar 

  30. Wiese, D., Rabinovitch, G., Reichert, M., Arenswald, S.: Autonomic tuning expert: A framework for best-practice oriented autonomic database tuning. In: CASCON 2008, pp. 327–341 (2008)

    Google Scholar 

Download references

Acknowledgments

We would like to thank anonymous reviewers for their valuable comments on this work. This work is partially supported by Russian Foundation for Basic Research grant 16-57-48001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George Chernishev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bobrov, N., Chernishev, G., Novikov, B. (2017). Workload-Independent Data-Driven Vertical Partitioning. In: Kirikova, M., et al. New Trends in Databases and Information Systems. ADBIS 2017. Communications in Computer and Information Science, vol 767. Springer, Cham. https://doi.org/10.1007/978-3-319-67162-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67162-8_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67161-1

  • Online ISBN: 978-3-319-67162-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics