ABSTRACT
Flash solid state drives (SSDs) provide an attractive alternative to traditional magnetic hard disk drives (HDDs) for DBMS applications. Naturally there is substantial interest in redesigning critical database internals, such as join algorithms, for flash SSDs. However, we must carefully consider the lessons that we have learnt from over three decades of designing and tuning algorithms for magnetic HDD-based systems, so that we continue to reuse techniques that worked for magnetic HDDs and also work with flash SSDs.
The focus of this paper is on recalling some of these lessons in the context of ad hoc join algorithms. Based on an actual implementation of four common ad hoc join algorithms on both a magnetic HDD and a flash SSD, we show that many of the "surprising" results from magnetic HDD-based join methods also hold for flash SSDs. These results include the superiority of block nested loops join over sort-merge join and Grace hash join in many cases, and the benefits of blocked I/Os. In addition, we find that simply looking at the I/O costs when designing new flash SSD join algorithms can be problematic, as the CPU cost is often a bigger component of the total join cost with SSDs. We hope that these results provide insights and better starting points for researchers designing new join algorithms for flash SSDs.
- SQLite3. http://www.sqlite.org/.Google Scholar
- Transaction Processing Performance Council. http://www.tpc.org/.Google Scholar
- A. Ailamaki, D. DeWitt, M. Hill, and M. Skounakis. Weaving Relations for Cache Performance. In proceedings of the 27th International Conference on Very Large Data Bases (VLDB), pages 169--180, 2001. Google ScholarDigital Library
- L. Bouganim, B. Jonsson, and P. Bonnet. uFLIP: Understanding Flash IO Patterns. In proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR), 2009.Google Scholar
- K. Bratbergsengen. Hashing Methods and Relational Algebra Operations. In proceedings of the 10th International Conference on Very Large Data Bases (VLDB), pages 323--333, 1984. Google ScholarDigital Library
- G. Graefe. The five-minute rule twenty years later, and how flash memory changes the rules. In proceedings of the 3rd International Workshop on Data Management on New Hardware (DaMoN), 2007. Google ScholarDigital Library
- J. Gray and B. Fizgerald. Flash Disk Opportunity for Server-Applications. ACM QUEUE, 6(4):18--23, July 2008. Google ScholarDigital Library
- L. Haas, M. Carey, M. Livny, and A. Shukla. SEEKing the truth about ad hoc join costs. The VLDB journal, 6(3):241--256, 1997. Google ScholarDigital Library
- C. Hwang. Nanotechnology Enables a New Memory Growth Model. Proceedings of the IEEE, 91(11):1765--1771, November 2003.Google ScholarCross Ref
- M. Kitsuregawa, H. Tanaka, and T. Moto-Oka. Application of Hash to Database Machine and Its Architecture. New Generation Computing, 1(1):63--74, 1983.Google ScholarDigital Library
- D. Knuth. The Art of Computer Programming, Vol. 3: Sorting and Searching. Addison-Wesley, Reading, Mass, 1973.Google ScholarDigital Library
- S. Lee and B. Moon. Design of Flash-Based DBMS: An In-Page Logging Approach. In proceedings of the ACM SIGMOD International Conference on Management of Data, pages 55--66, 2007. Google ScholarDigital Library
- S. Lee, B. Moon, C. Park, J. Kim, and S. Kim. A Case for Flash Memory SSD in Enterprise Database Applications. In proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1075--1086, 2008. Google ScholarDigital Library
- D. Myers. On the Use of NAND Flash Memory in High-Performance Relational Databases. Master's Thesis, MIT, 2008.Google Scholar
- M. Polte, J. Simsa, and G. Gibson. Comparing Performance of Solid State Devices and Mechanical Disks. In proceedings of the 3rd Petascale Data Storage Workshop (PDS Workshop), 2008.Google Scholar
- M. Shah, S. Harizopoulos, J. Wiener, and G. Graefe. Fast Scans and Joins using Flash Drives. In proceedings of the 4th International Workshop on Data Management on New Hardware (DaMoN), 2008. Google ScholarDigital Library
- L. Shapiro. Join Processing in Database Systems with Large Main Memories. ACM Transactions on Database Systems, 11(3):239--264, September 1986. Google ScholarDigital Library
- D. Tsirogiannis, S. Harizopoulos, M. Shah, J. Wiener, and G. Graefe. Query Processing Techniques for Solid State Drives. In proceedings of the ACM SIGMOD International Conference on Management of Data, 2009. Google ScholarDigital Library
Index Terms
- Join processing for flash SSDs: remembering past lessons
Recommendations
Optimizing Nonindexed Join Processing in Flash Storage-Based Systems
Flash memory-based disks (or simply flash disks) have been widely used in today's computer systems. With their continuously increasing capacity and dropping price, it is envisioned that some database systems will operate on flash disks in the near ...
Exploiting Internal Parallelism of Flash-based SSDs
For the last few years, the major driving force behind the rapid performance improvement of SSDs has been the increment of parallel bus channels between a flash controller and flash memory packages inside the solid-state drives (SSDs). However, there ...
Optimizing NAND flash-based SSDs via retention relaxation
FAST'12: Proceedings of the 10th USENIX conference on File and Storage TechnologiesAs NAND Flash technology continues to scale down and more bits are stored in a cell, the raw reliability of NAND Flash memories degrades inevitably. To meet the retention capability required for a reliable storage system, we see a trend of longer write ...
Comments