Abstract
In recent years, we have seen a number of new database architectures based on the idea of vertical fragmentation of relations. These architectures target the analysis of huge amounts of relational data, because vertical fragmentation facilitates column scans which are common in analytic applications at the expense of single-tuple operations. Although sorting is a common operation for analytics, few is known about sorting vertically fragmented relations. This paper compares various possibilities to apply (external) merge sort to vertically fragmented relations on different layers of the memory hierarchy and gives hints on when to apply which one. We propose a Greedy algorithm to find the optimum mixture of steps that leads to a sorted version of a given relation which is stored column-wise.
Similar content being viewed by others
Notes
We assume that there are no unused attributes in \(\mathcal{A}\), otherwise those would simply be ignored, which is possible due to the assumed column-orientedness.
Disks have a physical block size as well (512 B or 4 kB) but a database or operating system may choose to combine several physical into one logical block.
Although this statement sounds questionable from today’s perspective [4].
Due to time and space constraints, the merge sort was only executed for data sets in memory, not on disk.
References
Abadi DJ (2008) Query execution in column-oriented database systems. MIT PhD dissertation
Ailamaki A, DeWitt DJ, Hill MD, Skounakis M (2001) Weaving relations for cache performance. In: Proceedings of the 27th international conference on very large data bases, VLDB ’01, San Francisco, CA, USA. Kaufmann, Los Altos, pp 169–180
Abadi DJ, Myers DS, DeWitt DJ, Madden SR (2007) Materialization strategies in a column-oriented DBMS. In: Data engineering, international conference on, pp 466–475
Boncz PA, Kersten ML, Manegold S (2008) Breaking the memory wall in monetdb. Commun ACM 51:77–85
Boncz PA (2002) Monet: a next-generation DBMS kernel for query-intensive applications. Phd thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May
Bösswetter D (2009) Spax—pax with super-pages. In: Grundspenkis J, Morzy T, Vossen G, Grundspenkis J, Morzy T, Vossen G (eds) ADBIS. Lecture notes in computer science, vol 5739. Springer, Berlin, pp 362–377
Copeland GP, Khoshafian SN (1985) A decomposition storage model. SIGMOD Rec 14(4):268–279
Chhugani J, Nguyen AD, Lee VW, Macy W, Hagog M, Chen Y-K, Baransi A, Kumar S, Dubey P (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB 1(2):1313–1324
Graefe G (2006) Implementing sorting in database systems. ACM Comput Surv 38:1–37
Hankins RA, Patel JM (2003) Data morphing: an adaptive, cache-conscious storage technique. In: Proceedings of the 29th international conference on very large data bases VLDB endowment, vldb’2003, pp 417–428
Martin WA (1971) Sorting. ACM Comput Surv 3:147–174
Manegold S, Boncz PA, Kersten ML (2002) Generic database cost models for hierarchical memory systems. In: VLDB. Kaufmann, Los Altos, pp 191–202
Navathe S, Wiederhold G, Dou J (1984) Vertical partitioning algorithms for database design. ACM Trans Database Syst 9(4):680–710
Plattner H (2009) A common database approach for oltp and olap using an in-memory column database. In: Proceedings of the 35th SIGMOD international conference on management of data, SIGMOD ’09, New York, NY, USA, pp 1–2. ACM, New York
Stonebraker M, Abadi DJ, Batkin A, Chen X., Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E, O’Neil P, Rasin A, Tran N, Zdonik S (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases VLDB endowment, VLDB ’05, pp 553–564
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bößwetter, D. Sorting in Column Stores. Datenbank Spektrum 11, 91–100 (2011). https://doi.org/10.1007/s13222-011-0054-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13222-011-0054-6