skip to main content
10.1145/3078468.3078477acmconferencesArticle/Chapter ViewAbstractPublication PagessystorConference Proceedingsconference-collections

FlashNet: flash/network stack co-design

Published: 22 May 2017 Publication History


During the past decade, network and storage devices have undergone rapid performance improvements, delivering ultra-low latency and several Gbps of bandwidth. Nevertheless, current network and storage stacks fail to deliver this hardware performance to the applications, often due to the loss of IO efficiency from stalled CPU performance. While many efforts attempt to address this issue solely on either the network or the storage stack, achieving high-performance for networked-storage applications requires a holistic approach that considers both.
In this paper, we present FlashNet, a software IO stack that unifies high-performance network properties with flash storage access and management. FlashNet builds on RDMA principles and abstractions to provide a direct, asynchronous, end-to-end data path between a client and remote flash storage. The key insight behind FlashNet is to co-design the stack's components (an RDMA controller, a flash controller, and a file system) to enable cross-stack optimizations and maximize IO efficiency. In micro-benchmarks, FlashNet improves 4kB network IOPS by 38.6% to 1.22M, decreases access latency by 43.5% to 50.4 µsecs, and prolongs the flash lifetime by 1.6--5.9× for writes. We illustrate the capabilities of FlashNet by building a Key-Value store, and porting a distributed data store that uses RDMA on it. The use of FlashNet's RDMA API improves the performance of KV store by 2×, and requires minimum changes for the ported data store to access remote flash devices.


I. Ahmad, A. Gulati, and A. Mashtizadeh. vIC: Interrupt Coalescing for Virtual Machine Storage Device IO. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, ATC '11, pages 45--58, 2011.
S. Bates. Donard: NVM Express for Peer-2-Peer between SSDs and other PCIe Devices,
A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A Protected Dataplane Operating System for High Throughput and Low Latency. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 49--65, 2014.
M. Bjørling, J. Axboe, D. Nellans, and P. Bonnet. Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems. In Proceedings of the 6th International Systems and Storage Conference, SYSTOR '13, pages 22:1--22:10, 2013.
M. A. Blumrich, K. Li, R. Alpert, C. Dubnicki, E. W. Felten, and J. Sandberg. Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer. In Proceedings of the 21st Annual International Symposium on Computer Architecture, ISCA '94, pages 142--153, 1994.
G. Buzzard, D. Jacobson, M. Mackey, S. Marovich, and J. Wilkes. An Implementation of the Hamlyn Sender-managed Interface Architecture. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI '96, pages 245--259, 1996.
A. M. Caulfield, T. I. Mollov, L. A. Eisner, A. De, J. Coburn, and S. Swanson. Providing Safe, User Space Access to Fast, Solid State Disks. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 387--400, 2012.
A. M. Caulfield and S. Swanson. QuickSAN: A Storage Area Network for Fast, Distributed, Solid State Disks. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, pages 464--474, 2013.
M. Chadalapaka, H. Shah, U. Elzur, P. Thaler, and M. Ko. A Study of iSCSI Extensions for RDMA (iSER). In Proceedings of the ACM SIGCOMM Workshop on Network-I/O Convergence: Experience, Lessons, Implications, NICELI '03, pages 209--219, 2003.
L. Chai, X. Ouyang, R. Noronha, and D. K. Panda. pNFS/PVFS2 over InfiniBand: Early Experiences. In Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing '07, PDSW '07, pages 5--11, 2007.
L.-P. Chang, T.-W. Kuo, and S.-W. Lo. Real-time Garbage Collection for Flash-memory Storage Systems of Real-time Embedded Systems. ACM Trans. Embed. Comput. Syst., 3(4):837--863, Nov. 2004.
B. Cully, J. Wires, D. Meyer, K. Jamieson, K. Fraser, T. Deegan, D. Stodden, G. Lefebvre, D. Ferstay, and A. Warfield. Strata: Scalable High-performance Storage on Virtualized Non-volatile Memory. In Proceedings of the 12th USENIX Conference on File and Storage Technologies, FAST'14, pages 17--31, 2014.
M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck, T. Talpey, and M. Wittle. The Direct Access File System. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, FAST'03, pages 175--188, 2003.
A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast Remote Memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 401--414, 2014.
D. R. Engler, M. F. Kaashoek, and J. O. Jr. Exokernel: An Operating System Architecture for Application-level Resource Management. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 251--266, 1995.
Flexible I/O tester,
Fitch, Blake G. and others. Blue Gene Active Storage (BGAS) for High Performance BG/Q I/O and Scalable Data-centric Analytics,
P. W. Frey and G. Alonso. Minimizing the Hidden Cost of RDMA. In Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems, ICDCS '09, pages 553--560, 2009.
G. A. Gibson, D. F. Nagle, K. Amiri, J. Butler, F. W. Chang, H. Gobioff, C. Hardin, E. Riedel, D. Rochberg, and J. Zelenka. A Cost-effective, High-bandwidth Storage Architecture. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VIII, pages 92--103, 1998.
G. A. Gibson, D. F. Nagle, K. Amiri, F. W. Chang, E. M. Feinberg, H. Gobioff, C. Lee, B. Ozceri, E. Riedel, D. Rochberg, and J. Zelenka. File Server Scaling with Network-attached Secure Disks. SIGMETRICS, 25(1):272--284, June 1997.
S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy. MegaPipe: A New Programming Interface for Scalable Network I/O. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 135--148, 2012.
M. Herlihy, N. Shavit, and M. Tzafrir. Hopscotch hashing. In Proceedings of the 22nd International Symposium on Distributed Computing, DISC '08, pages 350--364, 2008.
D. Hildebrand and P. Honeyman. Exporting Storage Systems in a Scalable Manner with pNFS. In Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies, MSST '05, pages 18--27, 2005.
T. Hoefler, R. B. Ross, and T. Roscoe. Distributing the Data Plane for Remote Storage Access. In 15th Workshop on Hot Topics in Operating Systems (HotOS XV), May 2015.
X.-Y. Hu, R. Haas, and E. Eleftheriou. Container Marking: Combining Data Placement, Garbage Collection and Wear Levelling for Flash. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS '11, pages 237--247, July 2011.
Intel. DPDK: Data Plane Development Kit,
N. Ioannou, I. Koltsidas, R. Pletka, S. Tomic, R. Stoica, T. Weigold, and E. Eleftheriou. SALSA: Treating the Weaknesses of Low-cost Flash in Software. In as a poster in 6th Annual Non- Volatile Memories Workshop, 2015.
E. Y. Jeong, S. Woo, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 489--502, 2014.
A. Joglekar, M. E. Kounavis, and F. L. Berry. A Scalable and High Performance Software iSCSI Implementation. In Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies, FAST'05, pages 267--280, 2005.
W. K. Josephson, L. A. Bongo, D. Flynn, and K. Li. DFS: A File System for Virtualized Flash Storage. In Proceedings of the 8th USENIX Conference on File and Storage Technologies, FAST'10, pages 85--100, 2010.
M. F. Kaashoek, D. R. Engler, G. R. Ganger, H. M. Briceño, R. Hunt, D. Mazières, T. Pinckney, R. Grimm, J. Jannotti, and K. Mackenzie. Application Performance and Flexibility on Exokernel Systems. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles, SOSP '97, pages 52--65, 1997.
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA Efficiently for Key-value Services. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, pages 295--306, 2014.
A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-sided (RDMA) Datagram RPCs. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, pages 185--201, 2016.
H.-J. Kim, Y.-S. Lee, and J.-S. Kim. NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs. In 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16), June 2016.
A. Klimovic, C. Kozyrakis, E. Thereska, B. John, and S. Kumar. Flash Storage Disaggregation. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys '16, pages 29:1--29:15, 2016.
A. Klimovic, H. Litz, and C. Kozyrakis. ReFlex: Remote Flash == Local Flash. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, pages 345--359, 2017.
K. C. Knowlton. A Fast Storage Allocator. Commun. ACM, 8(10):623--624, Oct. 1965.
E. Koukis, A. Nanos, and N. Koziris. GMBlock: Optimizing data movement in a block-level storage sharing system over Myrinet. Cluster Computing, 13(4):349--372, 2010.
C. Lee, D. Sim, J.-Y. Hwang, and S. Cho. F2FS: A New File System for Flash Storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST'15, pages 273--286, 2015.
E. K. Lee and C. A. Thekkath. Petal: Distributed Virtual Disks. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pages 84--92, 1996.
S. Lee, M. Liu, S. Jun, S. Xu, J. Kim, and A. Arvind. Application-managed Flash. In Proceedings of the 14th Usenix Conference on File and Storage Technologies, FAST'16, pages 339--353, 2016.
I. Lesokhin, H. Eran, S. Raindel, G. Shapiro, S. Grimberg, L. Liss, M. Ben-Yehuda, N. Amit, and D. Tsafrir. Page Fault Support for Network Controllers. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, pages 449--466, 2017.
B. Li, P. Zhang, Z. Huo, and D. Meng. Early Experiences with Write-Write Design of NFS over RDMA. In IEEE International Conference on Networking, Architecture, and Storage., pages 303--308, July 2009.
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A Holistic Approach to Fast In-memory Key-value Storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 429--444, 2014.
X. Lu, D. Shankar, S. Gugnani, and D. K. Panda. High-performance Design of Apache Spark with RDMA and its Benefits on Various Workloads. In IEEE International Conference on Big Data, pages 253--262, 2016.
K. Magoutis, S. Addetia, A. Fedorova, and M. I. Seltzer. Making the Most Out of Direct-Access Network Attached Storage. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, FAST '03, pages 189--202, 2003.
K. Magoutis, S. Addetia, A. Fedorova, M. I. Seltzer, J. S. Chase, A. J. Gallatin, R. Kisley, R. Wickremesinghe, and E. Gabber. Structure and Performance of the Direct Access File System. In Proceedings of the 2002 USENIX ATC, pages 1--14, 2002.
I. Marinos, R. N. Watson, and M. Handley. Network Stack Specialization for Performance. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, pages 175--186, 2014.
B. Metzler et al. SoftiWARP: Software iWARP kernel driver and user library for Linux at, accessed February, 2017.
J. Mickens, E. B. Nightingale, J. Elson, K. Nareddy, D. Gehring, B. Fan, A. Kadav, V. Chidambaram, and O. Khan. Blizzard: Fast, Cloud-scale Block Storage for Cloud-oblivious Applications. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 257--273, 2014.
C. Mitchell, Y. Geng, and J. Li. Using One-sided RDMA Reads to Build a Fast, CPU-efficient Key-value Store. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC'13, pages 103--114, 2013.
Netperf: A network performance benchmark.
NVM Express over Fabrics Specification 1.0,
M. Nanavati, M. Schwarzkopf, J. Wires, and A. Warfield. Non-volatile Storage. Queue, 13(9):20:33--20:56, Nov. 2015.
W. Noureddine. Implementing NVMe over Fabrics,
J. Ousterhout, A. Gopalan, A. Gupta, A. Kejriwal, C. Lee, B. Montazeri, D. Ongaro, S. J. Park, H. Qin, M. Rosenblum, S. Rumble, R. Stutsman, and S. Yang. The RAMCloud Storage System. ACM Trans. Comput. Syst., 33(3):7:1--7:55, Aug. 2015.
V. S. Pai, P. Druschel, and W. Zwaenepoel. IO-lite: A Unified I/O Buffering and Caching System. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, OSDI '99, pages 15--28, 1999.
A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris. Improving Network Connection Locality on Multicore Systems. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys '12, pages 337--350, 2012.
S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The Operating System is the Control Plane. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 1--16, 2014.
RDMA communication manager API,
L. Rizzo. Netmap: A Novel Framework for Fast Packet I/O. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC'12, pages 101--112, 2012.
F. Schürmann et al. Rebasing I/O for Scientific Computing: Leveraging Storage Class Memory in an IBM BlueGene/Q Supercomputer. In Supercomputing, volume 8488 of Lecture Notes in Computer Science, pages 331--347. Springer International Publishing, 2014.
S. Seshadri, M. Gahagan, S. Bhaskaran, T. Bunker, A. De, Y. Jin, Y. Liu, and S. Swanson. Willow: A User-programmable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 67--80, 2014.
D. I. Shin, Y. J. Yu, H. S. Kim, J. W. Choi, D. Y. Jung, and H. Y. Yeom. Dynamic Interval Polling and Pipelined Post I/O Processing for Low-latency Storage Class Memory. In Proceedings of the 5th USENIX Conference on Hot Topics in Storage and File Systems, HotStorage'13, 2013.
W. Shin, Q. Chen, M. Oh, H. Eom, and H. Y. Yeom. OS I/O Path Optimizations for Flash Solid-state Drives. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 483--488, 2014.
Solarflare Communications Inc. OpenOnload at, 2013.
V. Srinivasan, B. Bulkowski, W.-L. Chu, S. Sayyaparaju, A. Gooding, R. Iyer, A. Shinde, and T. Lopatic. Aerospike: Architecture of a Real-time Operational DBMS. Proc. VLDB Endow., pages 1389--1400, 2016.
P. Stuedi, A. Trivedi, and B. Metzler. Wimpy Nodes with 10GbE: Leveraging One-sided Operations in soft-RDMA to Boost Memcached. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC'12, pages 347--353, 2012.
P. Stuedi, A. Trivedi, J. Pfefferle, R. Stoica, B. Metzler, N. Ioannou, and I. Koltsidas. Crail: A High-Performance I/O Architecture for Distributed Data Processing. IEEE Bulletin of the Technical Committee on Data Engineering, 40(1):40--52, March 2017.
N. Talagala. Native Flash Support for Applications, at Flash Memory Summit, 2012.
A. Trivedi, B. Metzler, and P. Stuedi. A Case for RDMA in Clouds: Turning Supercomputer Networking into Commodity. In Proceedings of the 2nd APSys, pages 17:1--17:5, 2011.
A. Trivedi, P. Stuedi, B. Metzler, C. Lutz, M. Schmatz, and T. R. Gross. RStore: A Direct-Access DRAM-based Data Store. In 35th IEEE International Conference on Distributed Computing Systems (ICDCS), pages 674--685, June 2015.
A. Trivedi, P. Stuedi, B. Metzler, R. Pletka, B. G. Fitch, and T. R. Gross. Unified High-Performance I/O: One Stack to Rule Them All. In Presented as part of the 14th Workshop on Hot Topics in Operating Systems, 2013.
Violin and Microsoft's High-Performance, All-Flash Enterprise Storage,
T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: A User-level Network Interface for Parallel and Distributed Computing. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 40--53, 1995.
X. Wei, J. Shi, Y. Chen, R. Chen, and H. Chen. Fast In-memory Transaction Processing Using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, pages 87--104, 2015.
Z. Weiss, S. Subramanian, S. Sundararaman, N. Talagala, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. ANViL: Advanced Virtualization for Modern Non-volatile Memory Devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST'15, pages 111--118, 2015.
B. Welch, M. Unangst, Z. Abbasi, G. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou. Scalable Performance of the Panasas Parallel File System. In Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST'08, pages 2:1--2:17, 2008.
J. Wilkes. Hamlyn - an Interface for sender-based communications. Technical Report HPL-OSR-92-13, Hewlett-Packard Laboratories, 1992.
D. Xinidis, A. Bilas, and M. D. Flouris. Performance Evaluation of Commodity iSCSI-Based Storage Systems. In Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies, MSST '05, pages 261--269. IEEE Computer Society, 2005.
Q. Xu, H. Siyamwala, M. Ghosh, T. Suri, M. Awasthi, Z. Guz, A. Shayesteh, and V. Balakrishnan. Performance Analysis of NVMe SSDs and Their Implication on Real World Databases. In Proceedings of the 8th ACM International Systems and Storage Conference, SYSTOR '15, pages 6:1--6:11, 2015.
J. Yang, D. B. Minturn, and F. Hady. When Poll is Better Than Interrupt. In Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST'12, pages 25--32, 2012.

Cited By

View all
  • (2024)A Study on the Communication Effect of Chinese Traditional Sports Culture on a Global Scale Based on High-Dimensional Data ProcessingApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-36019:1Online publication date: 27-Nov-2024
  • (2024)Research on machine learning based processing strategies for large-scale datasetsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-29779:1Online publication date: 9-Oct-2024
  • (2024) Reviving Storage Systems Education in the 21 st Century — An experience report 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00074(616-625)Online publication date: 6-May-2024
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage Conference
May 2017
195 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



  • TCE: Technion Computer Engineering Center
  • USENIX Assoc: USENIX Assoc


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2017


Request permissions for this article.

Check for updates

Author Tags

  1. RDMA
  2. netwoked flash
  3. operating systems
  4. performance


  • Research-article



Acceptance Rates

Overall Acceptance Rate 108 of 323 submissions, 33%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics


Cited By

View all
  • (2024)A Study on the Communication Effect of Chinese Traditional Sports Culture on a Global Scale Based on High-Dimensional Data ProcessingApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-36019:1Online publication date: 27-Nov-2024
  • (2024)Research on machine learning based processing strategies for large-scale datasetsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-29779:1Online publication date: 9-Oct-2024
  • (2024) Reviving Storage Systems Education in the 21 st Century — An experience report 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00074(616-625)Online publication date: 6-May-2024
  • (2023)CPU-free Computing: A Vision with a BlueprintProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595906(1-14)Online publication date: 22-Jun-2023
  • (2019)Multi-queue fair queueingProceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference10.5555/3358807.3358834(301-314)Online publication date: 10-Jul-2019
  • (2018)The case of FEMUProceedings of the 16th USENIX Conference on File and Storage Technologies10.5555/3189759.3189767(83-90)Online publication date: 12-Feb-2018
  • (2018)Elevating Commodity Storage with the SALSA Host Translation Layer2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2018.00035(277-292)Online publication date: Sep-2018

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media