Skip to main content
Log in

Sorting in Column Stores

  • Schwerpunktbeitrag
  • Published:
Datenbank-Spektrum Aims and scope Submit manuscript

Abstract

In recent years, we have seen a number of new database architectures based on the idea of vertical fragmentation of relations. These architectures target the analysis of huge amounts of relational data, because vertical fragmentation facilitates column scans which are common in analytic applications at the expense of single-tuple operations. Although sorting is a common operation for analytics, few is known about sorting vertically fragmented relations. This paper compares various possibilities to apply (external) merge sort to vertically fragmented relations on different layers of the memory hierarchy and gives hints on when to apply which one. We propose a Greedy algorithm to find the optimum mixture of steps that leads to a sorted version of a given relation which is stored column-wise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. We assume that there are no unused attributes in \(\mathcal{A}\), otherwise those would simply be ignored, which is possible due to the assumed column-orientedness.

  2. http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/.

  3. Disks have a physical block size as well (512 B or 4 kB) but a database or operating system may choose to combine several physical into one logical block.

  4. Although this statement sounds questionable from today’s perspective [4].

  5. http://www.tpc.org.

  6. Due to time and space constraints, the merge sort was only executed for data sets in memory, not on disk.

References

  1. Abadi DJ (2008) Query execution in column-oriented database systems. MIT PhD dissertation

  2. Ailamaki A, DeWitt DJ, Hill MD, Skounakis M (2001) Weaving relations for cache performance. In: Proceedings of the 27th international conference on very large data bases, VLDB ’01, San Francisco, CA, USA. Kaufmann, Los Altos, pp 169–180

    Google Scholar 

  3. Abadi DJ, Myers DS, DeWitt DJ, Madden SR (2007) Materialization strategies in a column-oriented DBMS. In: Data engineering, international conference on, pp 466–475

    Google Scholar 

  4. Boncz PA, Kersten ML, Manegold S (2008) Breaking the memory wall in monetdb. Commun ACM 51:77–85

    Article  Google Scholar 

  5. Boncz PA (2002) Monet: a next-generation DBMS kernel for query-intensive applications. Phd thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, May

  6. Bösswetter D (2009) Spax—pax with super-pages. In: Grundspenkis J, Morzy T, Vossen G, Grundspenkis J, Morzy T, Vossen G (eds) ADBIS. Lecture notes in computer science, vol 5739. Springer, Berlin, pp 362–377

    Google Scholar 

  7. Copeland GP, Khoshafian SN (1985) A decomposition storage model. SIGMOD Rec 14(4):268–279

    Article  Google Scholar 

  8. Chhugani J, Nguyen AD, Lee VW, Macy W, Hagog M, Chen Y-K, Baransi A, Kumar S, Dubey P (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB 1(2):1313–1324

    Google Scholar 

  9. Graefe G (2006) Implementing sorting in database systems. ACM Comput Surv 38:1–37

    Article  Google Scholar 

  10. Hankins RA, Patel JM (2003) Data morphing: an adaptive, cache-conscious storage technique. In: Proceedings of the 29th international conference on very large data bases VLDB endowment, vldb’2003, pp 417–428

    Google Scholar 

  11. Martin WA (1971) Sorting. ACM Comput Surv 3:147–174

    Article  MATH  Google Scholar 

  12. Manegold S, Boncz PA, Kersten ML (2002) Generic database cost models for hierarchical memory systems. In: VLDB. Kaufmann, Los Altos, pp 191–202

    Google Scholar 

  13. Navathe S, Wiederhold G, Dou J (1984) Vertical partitioning algorithms for database design. ACM Trans Database Syst 9(4):680–710

    Article  Google Scholar 

  14. Plattner H (2009) A common database approach for oltp and olap using an in-memory column database. In: Proceedings of the 35th SIGMOD international conference on management of data, SIGMOD ’09, New York, NY, USA, pp 1–2. ACM, New York

    Chapter  Google Scholar 

  15. Stonebraker M, Abadi DJ, Batkin A, Chen X., Cherniack M, Ferreira M, Lau E, Lin A, Madden S, O’Neil E, O’Neil P, Rasin A, Tran N, Zdonik S (2005) C-store: a column-oriented DBMS. In: Proceedings of the 31st international conference on very large data bases VLDB endowment, VLDB ’05, pp 553–564

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Bößwetter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bößwetter, D. Sorting in Column Stores. Datenbank Spektrum 11, 91–100 (2011). https://doi.org/10.1007/s13222-011-0054-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13222-011-0054-6

Keywords

Navigation