skip to main content
10.1145/3514221.3526126acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Avoiding Read Stalls on Flash Storage

Published: 11 June 2022 Publication History

Abstract

When a dirty victim page is selected for replacement upon page miss, the buffer manager has to first flush the dirty victim to the storage before reading the missing page. This conventional read-after-write (RAW) protocol, while working well on hard disks, causes the problem of read stall on flash storage with asymmetric read-write speed and parallelism; because of the resource conflict for a buffer frame between write and read operations, a page-missing process has to wait for the slow write to complete to secure a clean frame for the missing page. This strict write-then-read serialization under-utilizes CPU and storage, worsening transaction throughput and latency. To avoid the read stall problem on flash storage, this paper proposes write-after-read (WAR) protocol as a new I/O architecture between buffer manager and flash storage. In WAR, foreground processes make victim frames clean instantly by temporarily copying dirty pages at LRU tail into a separate DRAM space and read their missing pages into the cleaned frames with no stall. The dirty pages will be written to the storage asynchronously. By resolving resource conflict and thus avoiding read stalls, the database engine can issue more I/Os in parallel and better utilize CPU as well as storage, improving throughput and latency. We prototype WAR in two database storage engines, MySQL/InnoDB and Zero. Our comprehensive experimental results show that WAR improves transaction throughput by up to 2.9x compared to RAW.

Supplemental Material

MP4 File
One scheme tightly involved with the buffer manager is the Read-After-Write (RAW) protocol. When a dirty victim is chosen for replacement on a page miss, the buffer manager will first write the victim to the storage to clean the frame and then read the missing page into it. This is because two I/O operations share one buffer frame. This resource conflict leads to frequent read stalls, and the resulting strict I/O ordering prevents the system from fully exploiting device parallelism. To avoid the read stall problem on flash storage, this paper proposes the Write-After-Read (WAR) protocol as a new I/O architecture between buffer manager and flash storage. Upon a read stall in WAR, the missed page is read first, and the dirty page is written later to the storage. By replicating conflicting resources and thus avoiding read stalls, WAR can take advantage of the internal parallelism of flash storage and ultimately increase CPU and I/O utilization, improving throughput and latency.

References

[1]
I. Ahmed, G. Smith, and E. Pirozzi. PostgreSQL 10 High Performance: Expert Techniques for Query Optimization, High Availability, and Efficient Database Maintenance. Packt Publishing, 2018.
[2]
Amazon Web Services, Inc. Amazon web services - amazon aws. https://aws.amazon.com, 2018.
[3]
T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: A database benchmark based on the facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13, page 1185--1196, New York, NY, USA, 2013. Association for Computing Machinery.
[4]
J. Axboe. FIO (Flexible IO Tester). https://github.com/axboe/fio.
[5]
W. Bridge, A. Joshi, M. Keihl, T. Lahiri, J. Loaiza, and N. MacNaughton. The oracle universal server buffer. In Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB '97, page 590--594, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc.
[6]
F. Chen, R. Lee, and X. Zhang. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pages 266--277, 2011.
[7]
S. Chen, A. Ailamaki, M. Athanassoulis, P. B. Gibbons, R. Johnson, I. Pandis, and R. Stoica. Tpc-e vs. tpc-c: Characterizing the new tpc-e benchmark via an i/o comparison study. SIGMOD Rec., 39(3):5--10, Feb. 2011.
[8]
Cockroach Labs. Benchmarking the cloud: Aws, gcp, azure (2021 cloud report). https://www.cockroachlabs.com/guides/2021-cloud-report/, 2021.
[9]
J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, Feb. 2013.
[10]
K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic performance diagnosis and tuning in oracle. In CIDR, 2005.
[11]
N. Elyasi, C. Choi, A. Sivasubramaniam, J. Yang, and V. Balakrishnan. Trimming the tail for deterministic read performance in ssds. In 2019 IEEE International Symposium on Workload Characterization (IISWC), pages 49--58, 2019.
[12]
Google Cloud. Google cloud platform. https://cloud.google.com, 2018.
[13]
G. Graefe, H. Kimura, and H. Kuno. Foster b-trees. ACM Transactions on Database Systems, 37(3), Sept. 2012.
[14]
J. Gray and B. Fitzgerald. Flash disk opportunity for server applications: Future flash-based disks could provide breakthroughs in iops, power, reliability, and volumetric capacity when compared with conventional disks. Queue, 6(4):18--23, July 2008.
[15]
J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques (Section 13.4.4). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 1992.
[16]
E. Hanke. Write cliff causes and mitigation techniques. https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2017/20170810_FC32_Hanke.pdf, 2017.
[17]
G. Harrison. Using flash ssd to optimize oralce database performance. https://www.slideshare.net/gharriso/ssd-and-the-db-flash-cache, 2014.
[18]
Jasmine OpenSSD. Openssd project. http://www.openssd-project.org/wiki/Jasmine_OpenSSD_Platform, 2011.
[19]
R. Johnson, I. Pandis, N. Hardavellas, A. Ailamaki, and B. Falsafi. Shore-mt: A scalable storage manager for the multicore era. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT '09, page 24--35, New York, NY, USA, 2009. Association for Computing Machinery.
[20]
W.-H. Kang, S.-W. Lee, and B. Moon. Flash as cache extension for online transactional workloads. The VLDB Journal, 25(5):673--694, Oct. 2016.
[21]
W.-H. Kang, S.-W. Lee, B. Moon, Y.-S. Kee, and M. Oh. Durable write cache in flash memory ssd for relational and nosql databases. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, page 529--540, New York, NY, USA, 2014. Association for Computing Machinery.
[22]
A. Kopytov. Sysbench. https://github.com/akopytov/sysbench, 2018.
[23]
J. Kwak, S. Lee, K. Park, J. Jeong, and Y. H. Song. Cosmos+openssd: Rapid prototype for flash storage systems. ACM Transactions on Storage, 16(3), July 2020.
[24]
S.-W. Lee, B. Moon, and C. Park. Advances in flash memory ssd technology for enterprise database applications. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD '09, page 863--870, New York, NY, USA, 2009. Association for Computing Machinery.
[25]
S. T. Leutenegger and D. Dias. A modeling study of the tpc-c benchmark. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, SIGMOD '93, page 22--31, 1993.
[26]
J. Lewis. Oracle Core: Essential Internals for DBAs and Developers. Apress, USA, 1st edition, 2011.
[27]
V. Memory. Flash fabric architecture (version 2.0). A Whitepaper from Violin Memory, Mar. 2016.
[28]
MySQL Team (Oracle Corp.). Configuring buffer pool flushing. https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool-flushing.html, 2021.
[29]
MySQL Team (Oracle Corp.). The innodb buffer pool. https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.html, 2021.
[30]
MySQL Team (Oracle Corp.). Mysql 5.7 reference manual. https://dev.mysql.com/doc/refman/5.7/en/, 2021.
[31]
MySQL Team (Oracle Corp.). Mysql server (github repository). https://github.com/mysql/mysql-server, 2021.
[32]
MySQL Team (Oracle Corp.). Optimizing innodb disk i/o. https://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-diskio.html, 2021.
[33]
MySQL Team (Oracle Corp.). Server system variable reference. https://dev.mysql.com/doc/refman/5.7/en/server-system-variable-reference.html, 2021.
[34]
E. Nam, B. Kim, H. Eom, and S. Min. Ozone (o3): An out-of-order flash memory controller architecture. IEEE Transactions on Computers, 60(5):653--666, 2011.
[35]
E. H. Nam, B. S. J. Kim, H. Eom, and S. L. Min. Ozone (o3): An out-of-order flash memory controller architecture. IEEE Transactions on Computers, 60(5):653--666, 2011.
[36]
S. T. On, S. Gao, B. He, M. Wu, Q. Luo, and J. Xu. Fd-buffer: A cost-based adaptive buffer replacement algorithm for flashmemory devices. IEEE Transactions on Computers, 63(9):2288--2301, 2014.
[37]
T. I. Papon and M. Athanassoulis. A Parametric I/O Model for Modern Storage Devices. In In Proceedings of the 17th International Workshop on Data Management on New Hardware, 2021. To appear.
[38]
S.-y. Park, D. Jung, J.-u. Kang, J.-s. Kim, and J. Lee. Cflru: A replacement algorithm for flash memory. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES '06, page 234--241, New York, NY, USA, 2006. Association for Computing Machinery.
[39]
D. A. Patterson and J. L. Hennessy. Computer Organization and Design, Fifth Edition: The Hardware/Software Interface. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 5th edition, 2013.
[40]
Percona. tpcc-mysql. https://github.com/Percona-Lab/tpcc-mysql, 2018.
[41]
R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, Inc., USA, 3 edition, 2002.
[42]
Samsung Electronics Corp. Samsung ssd 970 pro specifications. https://www.samsung.com/semiconductor/minisite/ssd/product/consumer/970pro, June 2020.
[43]
C. Sauer, G. Graefe, and T. Härder. Instant restore after a media failure, 2017.
[44]
R. Stoica and A. Ailamaki. Improving flash write performance by using update frequency. Proc. VLDB Endow., 6(9):733--744, July 2013.
[45]
A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in db2. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB '06, page 1081--1092. VLDB Endowment, 2006.
[46]
J. Z. Teng and R. A. Gumaer. Managing ibm database 2 buffers to maximize performance. IBM Systems Journal, 23(2):211--218, 1984.
[47]
The PostgreSQL Global Development Group. Postgresql 11 documentation: Resource consumption. https://www.postgresql.org/docs/current/runtime-config-resource.html, 2019.
[48]
T.-F. Tsuei, A. N. Packer, and K.-T. Ko. Database buffer size investigation for oltp workloads. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, SIGMOD '97, page 112--122, 1997.
[49]
D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, page 1009--1024, New York, NY, USA, 2017. Association for Computing Machinery.
[50]
G. Wu and X. He. Reducing ssd read latency via nand flash program and erase suspension. In Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST'12, page 10, USA, 2012. USENIX Association.
[51]
K. Yu. Optimizing oltp oracle database performance using dell express flash pcie ssds. https://downloads.dell.com/solutions/enterprise-solution-resources/PCIe_SSD_for_oracle_database_performance.pdf, Oct. 2012.

Cited By

View all
  • (2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
  • (2024)Volley: Accelerating Write-Read Orders in Disaggregated StorageProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650090(657-673)Online publication date: 22-Apr-2024
  • (2024)Orion: Interference-aware, Fine-grained GPU Sharing for ML ApplicationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629578(1075-1092)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. buffer management
  2. flash storage
  3. internal parallelism
  4. read-write asymmetry

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)11
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Boosting OLTP Performance with Per-Page Logging on NVDIMMProceedings of the ACM on Management of Data10.1145/37096673:1(1-28)Online publication date: 11-Feb-2025
  • (2024)Volley: Accelerating Write-Read Orders in Disaggregated StorageProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650090(657-673)Online publication date: 22-Apr-2024
  • (2024)Orion: Interference-aware, Fine-grained GPU Sharing for ML ApplicationsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629578(1075-1092)Online publication date: 22-Apr-2024
  • (2023)FlashAlloc: Dedicating Flash Blocks by ObjectsProceedings of the VLDB Endowment10.14778/3611479.361152416:11(3266-3278)Online publication date: 1-Jul-2023
  • (2023)LRU-C: Parallelizing Database I/Os for Flash SSDsProceedings of the VLDB Endowment10.14778/3598581.359860516:9(2364-2376)Online publication date: 1-May-2023
  • (2023)The Art of Losing to Win: Using Lossy Image Compression to Improve Data Loading in Deep Learning Pipelines2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00077(936-949)Online publication date: Apr-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media