Sorting in Column Stores

Bößwetter, Daniel

doi:10.1007/s13222-011-0054-6

Sorting in Column Stores

Schwerpunktbeitrag
Published: 25 May 2011

Volume 11, pages 91–100, (2011)
Cite this article

Datenbank-Spektrum Aims and scope Submit manuscript

Daniel Bößwetter¹

144 Accesses
Explore all metrics

Abstract

In recent years, we have seen a number of new database architectures based on the idea of vertical fragmentation of relations. These architectures target the analysis of huge amounts of relational data, because vertical fragmentation facilitates column scans which are common in analytic applications at the expense of single-tuple operations. Although sorting is a common operation for analytics, few is known about sorting vertically fragmented relations. This paper compares various possibilities to apply (external) merge sort to vertically fragmented relations on different layers of the memory hierarchy and gives hints on when to apply which one. We propose a Greedy algorithm to find the optimum mixture of steps that leads to a sorted version of a given relation which is stored column-wise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

We assume that there are no unused attributes in \(\mathcal{A}\), otherwise those would simply be ignored, which is possible due to the assumed column-orientedness.
http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/.
Disks have a physical block size as well (512 B or 4 kB) but a database or operating system may choose to combine several physical into one logical block.
Although this statement sounds questionable from today’s perspective [4].
http://www.tpc.org.
Due to time and space constraints, the merge sort was only executed for data sets in memory, not on disk.

References

Abadi DJ (2008) Query execution in column-oriented database systems. MIT PhD dissertation
Ailamaki A, DeWitt DJ, Hill MD, Skounakis M (2001) Weaving relations for cache performance. In: Proceedings of the 27th international conference on very large data bases, VLDB ’01, San Francisco, CA, USA. Kaufmann, Los Altos, pp 169–180
Google Scholar
Abadi DJ, Myers DS, DeWitt DJ, Madden SR (2007) Materialization strategies in a column-oriented DBMS. In: Data engineering, international conference on, pp 466–475
Google Scholar
Boncz PA, Kersten ML, Manegold S (2008) Breaking the memory wall in monetdb. Commun ACM 51:77–85
Article Google Scholar
Boncz PA (2002) Monet: a next-generation DBMS kernel for query-intensive applications. Phd thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May
Bösswetter D (2009) Spax—pax with super-pages. In: Grundspenkis J, Morzy T, Vossen G, Grundspenkis J, Morzy T, Vossen G (eds) ADBIS. Lecture notes in computer science, vol 5739. Springer, Berlin, pp 362–377
Google Scholar
Copeland GP, Khoshafian SN (1985) A decomposition storage model. SIGMOD Rec 14(4):268–279
Article Google Scholar
Chhugani J, Nguyen AD, Lee VW, Macy W, Hagog M, Chen Y-K, Baransi A, Kumar S, Dubey P (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB 1(2):1313–1324
Google Scholar
Graefe G (2006) Implementing sorting in database systems. ACM Comput Surv 38:1–37
Article Google Scholar
Hankins RA, Patel JM (2003) Data morphing: an adaptive, cache-conscious storage technique. In: Proceedings of the 29th international conference on very large data bases VLDB endowment, vldb’2003, pp 417–428
Google Scholar
Martin WA (1971) Sorting. ACM Comput Surv 3:147–174
Article MATH Google Scholar
Manegold S, Boncz PA, Kersten ML (2002) Generic database cost models for hierarchical memory systems. In: VLDB. Kaufmann, Los Altos, pp 191–202
Google Scholar
Navathe S, Wiederhold G, Dou J (1984) Vertical partitioning algorithms for database design. ACM Trans Database Syst 9(4):680–710
Article Google Scholar
Plattner H (2009) A common database approach for oltp and olap using an in-memory column database. In: Proceedings of the 35th SIGMOD international conference on management of data, SIGMOD ’09, New York, NY, USA, pp 1–2. ACM, New York
Chapter Google Scholar
Stonebraker M, Abadi DJ, Batkin A, Chen X., Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E, O’Neil P, Rasin A, Tran N, Zdonik S (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases VLDB endowment, VLDB ’05, pp 553–564
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science Database and Information Systems Group, Freie Universität Berlin, Takustraße 9, 14195, Berlin, Germany
Daniel Bößwetter

Authors

Daniel Bößwetter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Bößwetter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bößwetter, D. Sorting in Column Stores. Datenbank Spektrum 11, 91–100 (2011). https://doi.org/10.1007/s13222-011-0054-6

Download citation

Received: 17 March 2011
Accepted: 09 May 2011
Published: 25 May 2011
Issue Date: August 2011
DOI: https://doi.org/10.1007/s13222-011-0054-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sorting in Column Stores

Abstract

Access this article

Similar content being viewed by others

BoDS: A Benchmark on Data Sortedness

Research on Light-Weight Compression Schemes Based on Simulative Column-Store

Split Dictionaries for In-memory Column Stores in Mixed Workload Environments

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sorting in Column Stores

Abstract

Access this article

Similar content being viewed by others

BoDS: A Benchmark on Data Sortedness

Research on Light-Weight Compression Schemes Based on Simulative Column-Store

Split Dictionaries for In-memory Column Stores in Mixed Workload Environments

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation