Abstract
Integrating a solid state drive (SSD) into a data store is expected to improve its I/O performance. However, there is still a large difference between the price of an SSD and a hard-disk drive (HDD). One of the methods to offset the increase in cost of consisting devices is to configure a hybrid system using both devices. In such a system, a common method to decide the placement of data records is based on reference locality, i.e., placing the frequently accessed records in a faster SSD. In this paper, we propose an alternative that focuses on data skew by storing records with values that appear less often in an SSD while those that do more in an HDD. As we will show, this enhances the performance of fetching records using multi-dimensional indices. When records are fetched using one of the indices targeted for optimization, records stored in an SSD are likely be retrieved using random access, while those stored in an HDD using sequential access. Given the method does not rely on reference locality, its performance is stable between first and second accesses and it provides a performance gain even when a host memory is large enough to contain the entire working set of the application. Our implementation and experiments show that storing just \(20\,\%\) records in an SSD achieves up to \(76\,\%\) of the maximum reduction that would otherwise be obtained when all the records are stored in an SSD.
J. Suzuki—Visiting scholar at University of California, Berkeley when this work was done.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Canim, M., Mihaila, G.A., Bhattacharjee, B., Ross, K.A., Lang, C.A.: An object placement advisor for DB2 using solid state storage. In: VLDB, pp. 1318–1329 (2009)
Canim, M., Mihaila, G.A., Bhattacharjee, B., Ross, K.A., Lang, C.A.: SSD bufferpool extensions for database systems. In: VLDB, pp. 1435–1446 (2010)
Do, J., Zhang, D., Patel, J.M., DeWitt, D.J., Naughton, J.F., Halverson, A.: Turbocharging DBMS buffer pool using SSDs. In: SIGMOD (2011)
Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J.D., Manasse, M., Panigrahy, R.: Design tradeoffs for SSD performance. In: 2008 USENIX Annual Technical Conference (ATC’08), pp. 57–70 (2008)
Walton, C.B., Dale, A.G., Jenevein, R.M.: A taxonomy and performance model of data skew effects in parallel joins. In: VLDB, pp. 537–548 (1991)
Stoica, I.: Warehouse-Scale Computing and the BDAS Stack. http://ampcamp.berkeley.edu/amp-camp-one-berkeley-2012/
Intel SSD Product Comparison. http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-ssd.html
Seagate Desktop HDD. http://www.seagate.com.edgekey.net/staticfiles/docs/pdf/datasheet/disc/desktop-hdd-data-sheet-ds1770-1-1212us.pdf
Liu, X., Salem, K.: Hybrid storage management for database systems. In: VLDB, pp. 541–552 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Suzuki, J., Venkataraman, S., Agarwal, S., Franklin, M., Stoica, I. (2014). Record Placement Based on Data Skew Using Solid State Drives. In: Zhan, J., Han, R., Weng, C. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2014. Lecture Notes in Computer Science(), vol 8807. Springer, Cham. https://doi.org/10.1007/978-3-319-13021-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-13021-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13020-0
Online ISBN: 978-3-319-13021-7
eBook Packages: Computer ScienceComputer Science (R0)