Record Placement Based on Data Skew Using Solid State Drives

Suzuki, Jun; Venkataraman, Shivaram; Agarwal, Sameer; Franklin, Michael; Stoica, Ion

doi:10.1007/978-3-319-13021-7_14

Jun Suzuki¹⁶,
Shivaram Venkataraman¹⁷,
Sameer Agarwal¹⁷,
Michael Franklin¹⁷ &
…
Ion Stoica¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8807))

Included in the following conference series:

Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware

1556 Accesses

Abstract

Integrating a solid state drive (SSD) into a data store is expected to improve its I/O performance. However, there is still a large difference between the price of an SSD and a hard-disk drive (HDD). One of the methods to offset the increase in cost of consisting devices is to configure a hybrid system using both devices. In such a system, a common method to decide the placement of data records is based on reference locality, i.e., placing the frequently accessed records in a faster SSD. In this paper, we propose an alternative that focuses on data skew by storing records with values that appear less often in an SSD while those that do more in an HDD. As we will show, this enhances the performance of fetching records using multi-dimensional indices. When records are fetched using one of the indices targeted for optimization, records stored in an SSD are likely be retrieved using random access, while those stored in an HDD using sequential access. Given the method does not rely on reference locality, its performance is stable between first and second accesses and it provides a performance gain even when a host memory is large enough to contain the entire working set of the application. Our implementation and experiments show that storing just $20\,\%$ records in an SSD achieves up to $76\,\%$ of the maximum reduction that would otherwise be obtained when all the records are stored in an SSD.

J. Suzuki—Visiting scholar at University of California, Berkeley when this work was done.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Memory Driven Design Methodologies for Optimal SSD Performance

Solid State Drives (SSDs)

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

Article 29 May 2015

References

Canim, M., Mihaila, G.A., Bhattacharjee, B., Ross, K.A., Lang, C.A.: An object placement advisor for DB2 using solid state storage. In: VLDB, pp. 1318–1329 (2009)
Google Scholar
Canim, M., Mihaila, G.A., Bhattacharjee, B., Ross, K.A., Lang, C.A.: SSD bufferpool extensions for database systems. In: VLDB, pp. 1435–1446 (2010)
Google Scholar
Do, J., Zhang, D., Patel, J.M., DeWitt, D.J., Naughton, J.F., Halverson, A.: Turbocharging DBMS buffer pool using SSDs. In: SIGMOD (2011)
Google Scholar
Agrawal, N., Prabhakaran, V., Wobber, T., Davis, J.D., Manasse, M., Panigrahy, R.: Design tradeoffs for SSD performance. In: 2008 USENIX Annual Technical Conference (ATC’08), pp. 57–70 (2008)
Google Scholar
Walton, C.B., Dale, A.G., Jenevein, R.M.: A taxonomy and performance model of data skew effects in parallel joins. In: VLDB, pp. 537–548 (1991)
Google Scholar
Stoica, I.: Warehouse-Scale Computing and the BDAS Stack. http://ampcamp.berkeley.edu/amp-camp-one-berkeley-2012/
Intel SSD Product Comparison. http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-ssd.html
Seagate Desktop HDD. http://www.seagate.com.edgekey.net/staticfiles/docs/pdf/datasheet/disc/desktop-hdd-data-sheet-ds1770-1-1212us.pdf
Liu, X., Salem, K.: Hybrid storage management for database systems. In: VLDB, pp. 541–552 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Green Platform Research Laboratories, NEC, Kawasaki, Japan
Jun Suzuki
University of California, Berkeley, USA
Shivaram Venkataraman, Sameer Agarwal, Michael Franklin & Ion Stoica

Authors

Jun Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Shivaram Venkataraman
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Michael Franklin
View author publications
You can also search for this author in PubMed Google Scholar
Ion Stoica
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Suzuki .

Editor information

Editors and Affiliations

ICT, Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan
ICT, Chinese Academy of Sciences, Beijing, China
Rui Han
Shannon (IT) Lab., Huawei, China
Chuliang Weng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suzuki, J., Venkataraman, S., Agarwal, S., Franklin, M., Stoica, I. (2014). Record Placement Based on Data Skew Using Solid State Drives. In: Zhan, J., Han, R., Weng, C. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2014. Lecture Notes in Computer Science(), vol 8807. Springer, Cham. https://doi.org/10.1007/978-3-319-13021-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-13021-7_14
Published: 11 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13020-0
Online ISBN: 978-3-319-13021-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics