skip to main content
10.1145/2723372.2742798acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Purity: Building Fast, Highly-Available Enterprise Flash Storage from Commodity Components

Published: 27 May 2015 Publication History

Abstract

Although flash storage has largely replaced hard disks in consumer class devices, enterprise workloads pose unique challenges that have slowed adoption of flash in ``performance tier'' storage appliances. In this paper, we describe Purity, the foundation of Pure Storage's Flash Arrays, the first all-flash enterprise storage system to support compression, deduplication, and high-availability.
Purity borrows techniques from modern database and key-value storage architectures, and introduces novel storage primitives that have wide applicability to data management systems. For instance, all writes in Purity are monotonic, and deletions are handled using an atomic predicate-based tuple elision primitive.
Purity's redundancy mechanisms are optimized for SSD failure modes and performance characteristics, allowing for fast recovery from component failures and lower space overhead than the best hard disk systems. We built deduplication and data compression schemes atop these primitives.
Flash changes storage capacity/performance tradeoffs: unlike disk-based systems, flash deployments are rarely performance bound. A single Purity appliance can provide over 7GiB/s of throughput on 32KiB random I/Os, even through multiple device failures, and while providing asynchronous off-site replication. Typical installations have 99.9% latencies under 1ms, and production arrays average 5.4x data reduction and 99.999% availability.
Purity takes advantage of storage performance increasing more rapidly than computational performance to build a simpler (with respect to engineering, installation, and management) scale-up storage appliance that supports hundreds of terabytes of highly-available, high-performance storage. The resulting performance and capacity supports many customer deployments of multiple applications, including scale-out and parallel systems, such as MongoDB and Oracle RAC, on a single Purity appliance.

References

[1]
D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In Proc. SIGMOD Conf., pages 671--682, 2006.
[2]
D. Agrawal, D. Ganesan, R. Sitaraman, Y. Diao, and S. Singh. Lazy-adaptive tree: An optimized index structure for flash devices. In Proc. 35th VLDB Conf., Aug. 2009.
[3]
P. Alvaro, N. Conway, J. Hellerstein, and W. R. Marczak. Consistency analysis in Bloom: a CALM and collected approach. In Proc. CIDR, pages 249--260, 2011.
[4]
T. J. Ameloot and J. Van den Bussche. Positive Dedalus programs tolerate non-causality. Journal of Computer and System Sciences, 80(7):1191--1213, 2014.
[5]
A. Anand, C. Muthukrishnan, S. Kappes, A. Akella, and S. Nath. Cheap and large CAMs for high performance data-intensive networked systems. In Proc. 7th NSDI Symp., 2010.
[6]
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. In Proc. 22nd SOSP Conf., pages 1--14, Oct. 2009.
[7]
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proc. 2012 SIGMETRICS, pages 53--64, 2012.
[8]
M. Balakrishnan, D. Malkhi, V. Prabhakaran, T. Wobber, M. Wei, and J. D. Davis. CORFU: A shared log design for flash clusters. In Proc. 9th NSDI Symp., Apr. 2012.
[9]
M. Balakrishnan, D. Malkhi, T. Wobber, M. Wu, V. Prabhakaran, M. Wei, J. D. Davis, S. Rao, T. Zou, and A. Zuck. Tango: Distributed data structures over a shared log. In Proc. 24th SOSP Conf., pages 325--340, Nov. 2013.
[10]
M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In Proc. 19th Symp. on Parallel Algorithms and Architectures, pages 81--92, 2007.
[11]
E. A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, 5(4):46--55, Aug. 2001.
[12]
M. Burrows, C. Jerian, B. Lampson, and T. Mann. On-line data compression in a log-structured file system. In Proc. 5th ASPLOS, pages 2--9, Oct. 1992.
[13]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In Proc. 7th OSDI Symp., Nov. 2006.
[14]
F. Chen, T. Luo, and X. Zhang. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proc. 9th FAST Conf., Feb. 2011.
[15]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. In Proc. 34th VLDB Conf., Aug. 2008.
[16]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proc. 1st ACM Symp. on Cloud Computing, 2010.
[17]
B. Cully, J. Wires, D. Meyer, K. Jamieson, K. Fraser, T. Deegan, D. Stodden, G. Lefebvre, D. Ferstay, and A. Warfield. Strata: High-performance scalable storage on virtualized non-volatile memory. In Proc. 12th FAST Conf., Feb. 2014.
[18]
J. Dean. Designs, lessons and advice from building large distributed systems. Keynote from LADIS, 2009.
[19]
J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, 2013.
[20]
B. Debnath, M. F. Mokbel, D. J. Lilja, and D. Du. Deferred updates for flash-based storage. In Proc. 26th IEEE Conf. on Mass Storage Systems and Technologies, 2010.
[21]
B. Debnath, S. Sengupta, and J. Li. ChunkStash: Speeding up inline storage deduplication using flash memory. In Proc. 2010 USENIX Annual Technical Conference, June 2010.
[22]
B. Debnath, S. Sengupta, and J. Li. FlashStore: High throughput persistent key-value store. In Proc. 36th VLDB Conf., Sept. 2010.
[23]
B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proc. SIGMOD Conf., 2011.
[24]
J. Gray and G. Graefe. The five-minute rule ten years later, and other computer storage rules of thumb. ACM Sigmod Record, 26(4):63--68, 1997.
[25]
J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. In Proc. SIGMOD Conf., pages 173--182, June 1996.
[26]
J. N. Gray, R. A. Lorie, G. R. Putzolu, and I. L. Traiger. Granularity of locks and degrees of consistency in a shared data base. In Proc. IFIP Working Conf. on Modelling in Data Base Management Systems, pages 365--394, 1976.
[27]
K. M. Greenan, D. D. Long, E. L. Miller, T. J. E. Schwarz, S.J., and A. Wildani. Building flexible, fault-tolerant flash-based storage systems. In Proc. 5th Workshop on Hot Topics in System Dependability, June 2009.
[28]
L. M. Grupp, J. D. Davis, and S. Swanson. The bleak future of NAND flash memory. In Proc. 10th FAST Conf., Feb. 2012.
[29]
L. M. Grupp, J. D. Davis, and S. Swanson. The harey tortoise: Managing heterogeneous write performance in SSDs. In Proc. 2013 USENIX Annual Technical Conference, June 2013.
[30]
A. Gupta, R. Pisolkar, B. Urgaonkar, and A. Sivasubramaniam. Leveraging value locality in optimizing NAND flash-based SSDs. In Proc. 9th FAST Conf., Feb. 2011.
[31]
J. Hamilton. Why scale matters and why the cloud is different. AWS re:Invent, 2013.
[32]
J. Hamilton. AWS innovation at scale. AWS re:Invent 2014.
[33]
A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to barter bits for chronons: compression and bandwidth trade offs for database scans. In Proc. SIGMOD Conf., pages 389--400, 2007.
[34]
B. R. Iyer and D. Wilhite. Data compression support in databases. In Proc. 20th VLDB Conf., pages 695--704, 1994.
[35]
C. Jermaine, E. Omiecinski, and W. G. Yee. The partitioned exponential file for database storage management. Proc. VLDB Endowment, 16:417--437, 2007.
[36]
K. Jin and E. L. Miller. The effectiveness of deduplication on virtual machine disk images. In Proc. SYSTOR 2009, May 2009.
[37]
W. K. Josephson, L. A. Bongo, K. Li, and D. Flynn. DFS: A file system for virtualized flash storage. ACM Trans. on Storage, 6(3), Sept. 2010.
[38]
Y. Kang and E. L. Miller. Adding aggressive error correction to a high-performance compressing flash file system. In Proc. 9th ACM & IEEE Conf. on Embedded Software (EMSOFT '09), Oct. 2009.
[39]
LevelDB: A fast and lightweight key/value database library by Google. https://code.google.com/p/leveldb/.
[40]
Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In Proc. 25th Int'l Conf. on Data Engineering, 2009.
[41]
H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: a memory-efficient, high-performance key-value store. In Proc. 23rd SOSP Conf., Oct. 2011.
[42]
D. T. Meyer and W. J. Bolosky. A study of practical deduplication. In Proc. 9th FAST Conf., Feb. 2011.
[43]
C. Min, K. Kim, H. Cho, S.-W. Lee, and Y. I. Eom. SFS: Random write considered harmful in solid state drives. In Proc. 10th FAST Conf., Feb. 2012.
[44]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33:351--385, 1996.
[45]
J. S. Plank, K. M. Greenan, and E. L. Miller. Screaming fast Galois field arithmetic using Intel SIMD instructions. In Proc. 11th FAST Conf., Feb. 2013.
[46]
Pure Storage reference architecture for Oracle databases, 2014.
[47]
RocksDB: A fork of LevelDB by Facebook. https://github.com/facebook/rocksdb/.
[48]
M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Trans. on Computer Systems, 10(1):26--52, Feb. 1992.
[49]
S. M. Rumble, A. Kejriwal, and J. Ousterhout. Log-structured memory for DRAM-based storage. In Proc. 12th FAST Conf., Feb. 2014.
[50]
R. Sears, M. Callaghan, and E. Brewer. Rose: Compressed, log-structured replication. In Proc. 34th VLDB Conf., pages 526--537, Aug. 2008.
[51]
R. Sears and R. Ramakrishnan. bLSM: A general purpose log structured merge tree. In Proc. SIGMOD Conf., pages 217--228, May 2012.
[52]
D. J. Sheehy and D. Smith. Bitcask. a log-structured hash table for fast key value data. Technical report, Technical report, Basho Technologies, 04 2010, 2010.
[53]
V. Sikka, F. Färber, W. Lehner, S. K. Cha, T. Peh, and C. Bornhövd. Efficient transaction processing in SAP HANA database: the end of a column store myth. In Proc. SIGMOD Conf., pages 731--742, 2012.
[54]
K. Srinivasan, T. Bisson, G. Goodson, and K. Voruganti. iDedup: Latency-aware, inline data deduplication for primary storage. In Proc. 10th FAST Conf., Feb. 2012.
[55]
R. Stoica, M. Athanassoulis, R. Johnson, and A. Ailamaki. Evaluating and repairing write performance on flash devices. In DaMoN, pages 9--14, June 2009.
[56]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-store: a column-oriented DBMS. In Proc. 31st VLDB Conf., pages 553--564, 2005.
[57]
Y. Wang, M. Kapritsos, Z. Ren, P. Mahajan, J. Kirubanandam, L. Alvisi, and M. Dahlin. Robustness in the Salus scalable block store. In Proc. 10th NSDI Symp., pages 357--370, 2013.
[58]
H. Yadava. The Berkeley DB Book. Apress, 2014.

Cited By

View all
  • (2024)Flash-oriented Coded Storage: Research Status and Future DirectionsACM Transactions on Storage10.1145/370899521:1(1-37)Online publication date: 19-Dec-2024
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)Storage Technology Trends and DevelopmentData Storage Architectures and Technologies10.1007/978-981-97-3534-1_14(379-428)Online publication date: 28-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deduplication
  2. enterprise flash storage
  3. high availability
  4. log structured storage
  5. scale up architectures
  6. storage area networks

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31 - June 4, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)6
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Flash-oriented Coded Storage: Research Status and Future DirectionsACM Transactions on Storage10.1145/370899521:1(1-37)Online publication date: 19-Dec-2024
  • (2024)Explorations and Exploitation for Parity-based RAIDs with Ultra-fast SSDsACM Transactions on Storage10.1145/362799220:1(1-32)Online publication date: 30-Jan-2024
  • (2024)Storage Technology Trends and DevelopmentData Storage Architectures and Technologies10.1007/978-981-97-3534-1_14(379-428)Online publication date: 28-Aug-2024
  • (2023)Vehicle To Vehicle Communication and Accident PreventionDESIGN, CONSTRUCTION, MAINTENANCE10.37394/232022.2023.3.183(201-207)Online publication date: 3-Nov-2023
  • (2023)ZapRAIDProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609810(24-29)Online publication date: 24-Aug-2023
  • (2023)RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-DesignProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613170(182-199)Online publication date: 23-Oct-2023
  • (2023)Disaggregated RAID Storage in Modern DatacentersProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582027(147-163)Online publication date: 25-Mar-2023
  • (2023)Extending and Programming the NVMe I/O Determinism Interface for Flash ArraysACM Transactions on Storage10.1145/356842719:1(1-33)Online publication date: 11-Jan-2023
  • (2023)Dynamic Multi-Resource Optimization for Storage Acceleration in Cloud Storage SystemsIEEE Transactions on Services Computing10.1109/TSC.2022.317333316:2(1079-1092)Online publication date: 1-Mar-2023
  • (2023)EaD: ECC-Assisted Deduplication With High Performance and Low Memory Overhead for Ultra-Low Latency Flash StorageIEEE Transactions on Computers10.1109/TC.2022.315266572:1(208-221)Online publication date: 1-Jan-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media