Abstract
Storage class memory is receiving increasing attention for use in HPC systems for the acceleration of intensive IO operations. We report a particular instance using SLC FLASH memory integrated with an IBM BlueGene/Q supercomputer at scale (Blue Gene Active Storage, BGAS). We describe two principle modes of operation of the non-volatile memory: 1) block device; 2) direct storage access (DSA). The block device layer, built on the DSA layer, provides compatibility with IO layers common to existing HPC IO systems (POSIX, MPIO, HDF5) and is expected to provide high performance in bandwidth critical use cases. The novel DSA strategy enables a low-overhead, byte addressable, asynchronous, kernel by-pass access method for very high user space IOPs in multithreaded application environments. Here, we expose DSA through HDF5 using a custom file driver. Benchmark results for the different modes are presented and scale-out to full system size showcases the capabilities of this technology.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Strande, S.M., Cicotti, P., Sinkovits, R.S., Young, W.S., Wagner, R., Tatineni, M., Hocks, E., Snavely, A., Norman, M.: Gordon: Design, performance, and experiences deploying and supporting a data intensive supercomputer. In: Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the Campus and Beyond, XSEDE 2012, New York, NY, USA, pp. 3:1–3:8. ACM (2012)
NNSA and US DoE - Office of Science, FastForward R&D draft statement of work (March 2013), https://asc.llnl.gov/fastforward/
Lawrence livermore, intel, cray produce big data machine to serve as catalyst for next-generation hpc clusters. Press Release (November 2013)
Williams, S., Waterman, A., Patterson, D.: Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM 52(4), 65–76 (2009)
Eleftheriou, E., Haas, R., Jelitto, J., Lantz, M., Pozidis, H.: Trends in storage technologies. Bulletin of the Technical Committee on Data Engineering 33(4), 4–13 (2010)
Markram, H.: The blue brain project. Nature Reviews. Neuroscience 7, 153–160 (2006), PMID: 16429124
Hay, E., Hill, S., Schürmann, F., Markram, H., Segev, I.: Models of neocortical layer 5b pyramidal cells capturing a wide range of dendritic and perisomatic active properties. PLoS Comput. Biol. 7, e1002107 (2011)
Reimann, M.W., Anastassiou, C.A., Perin, R., Hill, S.L., Markram, H., Koch, C.: A biophysically detailed model of neocortical local field potentials predicts the critical role of active membrane currents. Neuron 79, 375–390 (2013)
Hill, S.L., Wang, Y., Riachi, I., Schürmann, F., Markram, H.: Statistical connectivity provides a sufficient foundation for specific functional connectivity in neocortical neural microcircuits. Proceedings of the National Academy of Sciences 109, E2885–E2894 (2012), PMID: 22991468
Herculano-Houzel, S., Mota, B., Lent, R.: Cellular scaling rules for rodent brains. Proceedings of the National Academy of Sciences of the United States of America 103, 12138–12143 (2006)
Kozloski, J., Sfyrakis, K., Hill, S., Schürmann, F., Peck, C., Markram, H.: Identifying, tabulating, and analyzing contacts between branched neuron morphologies. IBM J. Res. Dev. 52, 43–55 (2008)
Migliore, M., Cannia, C., Lytton, W.W., Markram, H., Hines, M.L.: Parallel network simulations with NEURON. Journal of Computational Neuroscience 21, 119–129 (2006)
Tauheed, F., Biveinis, L., Heinis, T., Schürmann, F., Markram, H., Ailamaki, A.: Accelerating range queries for brain simulations. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 941–952 (April 2012)
Mesnier, M.P., Wachs, M., Sambasivan, R.R., Lopez, J., Hendricks, J., Ganger, G.R.: Trace: Parallel trace replay with approximate causal events. In: Proceedings of the 5th USENIX Symposium on File and Storage Technologies (FAST 2007). MCDOUGALL (2007)
Shan, H., Shalf, J.: Using IOR to analyze the I/O performance for HPC platforms. In: Cray User Group Conference (CUG 2007) (2007)
May, J.: Pianola: A script-based I/O benchmark. In: Petascale Data Storage Workshop, PDSW 2008, 3rd edn., pp. 1–6 (November 2008)
Frings, W., Hennecke, M.: A system level view of petascale I/O on IBM blue Gene/P. Computer Science - Research and Development 26, 275–283 (2011)
Carns, P., Harms, K., Allcock, W., Bacon, C., Lang, S., Latham, R., Ross, R.: Understanding and improving computational science storage access through continuous characterization. In: 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–14 (May 2011)
Xie, B., Chase, J., Dillow, D., Drokin, O., Klasky, S., Oral, S., Podhorszki, N.: Characterizing output bottlenecks in a supercomputer. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, Los Alamitos, CA, USA, pp. 8:1–8:11. IEEE Computer Society Press (2012)
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, CLADE 2008, New York, NY, USA, pp. 15–24. ACM (2008)
Frings, W., Wolf, F., Petkov, V.: Scalable massively parallel I/O to task-local files. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, New York, NY, USA, pp. 17:1–17:11. ACM (2009)
Behzad, B., Luu, H.V.T., Huchette, J., Byna, S.: Taming parallel I/O complexity with auto-tuning. In: Gropp, W., Matsuoka, S. (eds.) SC, p. 68. ACM (2013)
Cohen, J., Dossa, D., Gokhale, M., Hysom, D., May, J., Pearce, R., Yoo, A.: Storage-intensive supercomputing benchmark study. Technical report, Lawrence Livermore National Laboratory (2007)
Park, S., Shen, K.: A performance evaluation of scientific I/O workloads on flash-based SSDs. In: IEEE International Conference on Cluster Computing and Workshops, CLUSTER 2009, pp. 1–5 (August 2009)
Jung, M., Kandemir, M.: Revisiting widely held SSD expectations and rethinking system-level implications. In: Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2013, New York, NY, USA, pp. 203–216. ACM (2013)
Zheng, D., Burns, R., Szalay, A.S.: Toward millions of file system IOPS on low-cost, commodity hardware. In: Proceedings of SC 2013: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, New York, NY, USA, pp. 69:1–69:12. ACM (2013)
Fitch, B., Rayshubskiy, A., Ward, T., Germain, R.: Toward a general parallel operating system using active storage fabrics on Blue Gene/P. In: Computing with Massive and Persistent Data (CMPD 2008) (September 2008)
Fitch, B.G., Rayshubskiy, A., Pitman, M.C., Ward, T.J.C., Germain, R.S.: Using the active storage fabrics model to address petascale storage challenges. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW 2009, New York, NY, USA, pp. 47–54. ACM (2009)
Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: Fawn: a fast array of wimpy nodes. In: SOSP 2009: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, New York, NY, USA, pp. 1–14. ACM (2009)
Vasudevan, V., Tan, L., Andersen, D., Kaminsky, M., Kozuch, M.A., Pillai, P.: Fawnsort: Energy-efficient sorting of 10gb. Winner of 2010 10GB Joulesort Daytona and Indy categories (2010), http://sortbenchmark.org/fawnsort-joulesort-2012.pdf
Ousterhout, J., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Ongaro, D., Parulkar, G., Rosenblum, M., Rumble, S.M., Stratmann, E., Stutsman, R.: The case for RAMcloud. Communications of the ACM 54(7), 121–130 (2011)
Jung, M., Wilson III, E.H., Choi, W., Shalf, J., Aktulga, H.M., Yang, C., Saule, E., Catalyurek, U.V., Kandemir, M.: Exploring the future of out-of-core computing with compute-local non-volatile memory. In: Proceedings of SC 2013: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, New York, NY, USA, pp. 75:1–75:11. ACM (2013)
I. B. G. team, The IBM blue gene project. IBM Journal of Research and Development 57, 0:1–0:6 (2013)
Chen, D., Eisley, N.A., Heidelberger, P., Senger, R.M., Sugawara, Y., Kumar, S., Salapura, V., Satterfield, D., Steinmacher-Burow, B., Parker, J.: The IBM blue Gene/Q interconnection fabric. IEEE Micro 32(1), 32–43 (2012)
Schmuck, F., Haskin, R.: GPFS: A shared-disk file system for large computing clusters. In: FAST 2002: Proceedings of the 1st USENIX Conference on File and Storage Technologies, Berkeley, CA, USA, p. 19. USENIX Association (2002)
Haring, R., Ohmacht, M., Fox, T., Gschwind, M., Satterfield, D., Sugavanam, K., Coteus, P., Heidelberger, P., Blumrich, M., Wisniewski, R., Gara, A., Chiu, G., Boyle, P., Chist, N., Kim, C.: The IBM blue Gene/Q compute chip. IEEE Micro 32, 48–60 (2012)
Ryu, K.D., Inglett, T.A., Bellofatto, R., Blocksome, M.A., Gooding, T., Kumar, S., Mamidala, A.R., Megerian, M.G., Miller, S., Nelson, M.T., Rosenburg, B., Smith, B., Van Oosten, J., Wang, A., Wisniewski, R.W.: IBM blue Gene/Q system software stack. IBM Journal of Research and Development 57, 5:1–5:12 (2013)
OFED overview. Open Fabrics Alliance Website, https://www.openfabrics.org/resources/ofed-for-linux-ofed-for-windows/ofed-overview.html
Soumagne, J., Biddiscombe, J., Esnard, A.: Data Redistribution using One-sided Transfers to In-memory HDF5 Files. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 198–207. Springer, Heidelberg (2011)
Biddiscombe, J., Soumagne, J., Oger, G., Guibert, D., Piccinali, J.-G.: Parallel Computational Steering for HPC Applications using HDF5 Files in Distributed Shared Memory. IEEE Transactions on Visualization and Computer Graphics 18, 852–864 (2012)
Ior: Github repository, https://github.com/chaos/ior
Mpich2: Official website, http://www.mcs.anl.gov/research/projects/mpich2staging/goodell/
Gray, J., Putzolu, F.: The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for cpu time. SIGMOD Rec. 16(3), 395–398 (1987)
Gray, J., Graefe, G.: The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Rec. 26(4), 63–68 (1997)
Graefe, G.: The five-minute rule 20 years later: and how flash memory changes the rules. Queue 6, 40–52 (2008)
Gray, J., Fitzgerald, B.: Flash disk opportunity for server applications. Queue 6, 18–23 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Schürmann, F. et al. (2014). Rebasing I/O for Scientific Computing: Leveraging Storage Class Memory in an IBM BlueGene/Q Supercomputer. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)