skip to main content
research-article

BlueDBM: Distributed Flash Storage for Big Data Analytics

Published:30 June 2016Publication History
Skip Abstract Section

Abstract

Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data, and daily Twitter feeds, where the datasets of interest are 5TB to 20TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GB of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. However, currently available off-the-shelf flash storage packaged as SSDs does not make effective use of flash storage because it incurs a great amount of additional overhead during flash device management and network access. In this article, we present BlueDBM, a new system architecture that has flash-based storage with in-store processing capability and a low-latency high-throughput intercontroller network between storage devices. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a DRAM-centric system falls sharply even if only 5% to 10% of the references are to secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost/performance tradeoff for Big Data analytics.

References

  1. Anurag Acharya, Mustafa Uysal, and Joel Saltz. 1998. Active Disks. Technical Report. Santa Barbara, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In USENIX 2008 Annual Technical Conference on Annual Technical Conference (ATC’08). USENIX Association, Berkeley, CA, 57--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Vijayan Prabhakaran. 2010. Removing the costs of indirection in flash-based SSDs with nameless writes. In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’10). USENIX Association, Berkeley, CA, 1--5. http://dl.acm.org/citation.cfm?id=1863122.1863123 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Infiniband Trade Association. 2014 (accessed November 18, 2014). Infiniband. http://www.infinibandta.org.Google ScholarGoogle Scholar
  5. Jayanta Banerjee, David K. Hsiao, and Krishnamurthi Kannan. 1979. DBC: A database computer for very large databases. IEEE Trans. Comput. C-28, 6 (June 1979), 414--429. DOI:http://dx.doi.org/10.1109/TC.1979.1675381 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta, and Steven Swanson. 2010. Moneta: A high-performance storage array architecture for next-generation, non-volatile memories. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’43). IEEE Computer Society, 385--395. DOI:http://dx.doi.org/10.1109/MICRO.2010.33 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Adrian M. Caulfield and Steven Swanson. 2013. QuickSAN: A storage area network for fast, distributed, solid state disks. SIGARCH Comput. Archit. News 41, 3 (June 2013), 464--474. DOI:http://dx.doi.org/10.1145/2508148.2485962 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Benjamin Y. Cho, Won Seob Jeong, Doohwan Oh, and Won Woo Ro. 2013. XSD: Accelerating MapReduce by harnessing the GPU inside an SSD. In Proceedings of the 1st Workshop on Near-Data Processing.Google ScholarGoogle Scholar
  9. Eric S. Chung, John D. Davis, and Jaewon Lee. 2013. LINQits: Big data on little clients. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 261--272. DOI:http://dx.doi.org/10.1145/2485922.2485945 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). ACM, New York, NY, 143--154. DOI:http://dx.doi.org/10.1145/1807128.1807152 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jason Dai. 2010. Toward efficient provisioning and performance tuning for Hadoop. Proc. Apache Asia Roadshow 2010 (2010), 14--15.Google ScholarGoogle Scholar
  12. Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query processing on smart SSDs: Opportunities and challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 1221--1230. DOI:http://dx.doi.org/10.1145/2463676.2465295 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. FusionIO. 2012 (Accessed November 22, 2014). Using HBase with ioMemory. http://www.fusionio.com/white-papers/using-hbase-with-iomemory.Google ScholarGoogle Scholar
  14. FusionIO. 2014 (Accessed November 18, 2014). FusionIO. http://www.fusionio.com.Google ScholarGoogle Scholar
  15. Aristides Gionis, Piotr Indyk, Rajeev Motwani, and others. 1999. Similarity search in high dimensions via hashing. VLDB 99, 518--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Google. 2011 (Accessed November 18, 2014). Google Flu Trends. http://www.google.org/flutrends.Google ScholarGoogle Scholar
  17. Sergej Hardock, Ilia Petrov, Robert Gottstein, and Alejandro Buchmann. 2013. NoFTL: Database systems on FTL-less flash storage. Proc. VLDB Endow. 6, 12 (Aug. 2013), 1278--1281. DOI:http://dx.doi.org/10.14778/2536274.2536295 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Michio Honda, Felipe Huici, Costin Raiciu, Joao Araujo, and Luigi Rizzo. 2014. Rekindling network protocol innovation with user-level stacks. SIGCOMM Comput. Commun. Rev. 44, 2 (April 2014), 52--58. DOI:http://dx.doi.org/10.1145/2602204.2602212 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Intel. 2014 (Accessed November 18, 2014). Intel Solid-State Drive Data Center Family for PCIe. http://www.intel.com/content/www/us/en/solid-state-drives/intel-ssd-dc-fa mily-for-pcie.html.Google ScholarGoogle Scholar
  20. Nusrat S. Islam, Md. W. Rahman, Jithin Jose, Raghunath Rajachandrasekar, Hao Wang, Hari Subramoni, Chet Murthy, and Dhabaleswar K. Panda. 2012. High performance RDMA-based design of HDFS over InfiniBand. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA, Article 35, 35 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zsolt István, Louis Woods, and Gustavo Alonso. 2014. Histograms as a side effect of data movement for big data. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1567--1578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: A highly scalable user-level TCP stack for multicore systems. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). USENIX Association, Berkeley, CA, 489--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. William K. Josephson, Lars A. Bongo, Kai Li, and David Flynn. 2010. DFS: A file system for virtualized flash storage. Trans. Storage 6, 3, Article 14 (Sept. 2010), 25 pages. DOI:http://dx.doi.org/10.1145/1837915.1837922 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sang-Woo Jun, Ming Liu, Kermin Elliott Fleming, and Arvind. 2014. Scalable multi-access flash store for big data analytics. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’14). ACM, New York, NY, 55--64. DOI:http://dx.doi.org/10.1145/2554688.2554789 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Seok-Hoon Kang, Dong-Hyun Koo, Woon-Hak Kang, and Sang-Won Lee. 2013b. A case for flash memory SSD in hadoop applications. Int. J. Control Autom. 6, 1 (2013).Google ScholarGoogle Scholar
  26. Yangwook Kang, Yang-suk Kee, Ethan L. Miller, and Chanik Park. 2013a. Enabling cost-effective data processing with smart ssd. In 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST’13). IEEE, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  27. Kimberly Keeton, David A. Patterson, and Joseph M. Hellerstein. 1998. A case for intelligent disks (IDISKs). SIGMOD Rec. 27, 3 (Sept. 1998), 42--52. DOI:http://dx.doi.org/10.1145/290593.290602 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Myron King, Jamey Hicks, and John Ankcorn. 2015. Software-driven hardware development. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’15). ACM, New York, NY, 13--22. DOI:http://dx.doi.org/10.1145/2684746.2689064 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, 468--479. DOI:http://dx.doi.org/10.1145/2540708.2540748 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sungjin Lee, Jihong Kim, and Arvind. 2015. Refactored design of I/O architecture for flash storage. Comput. Archit. Lett. 14, 1 (Jan. 2015), 70--74. DOI:http://dx.doi.org/10.1109/LCA.2014.2329423Google ScholarGoogle ScholarCross RefCross Ref
  31. Sungjin Lee, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim, and Arvind. 2016. Application-managed flash. In 14th USENIX Conference on File and Storage Technologies (FAST’16). USENIX Association, Santa Clara, CA, 339--353. http://usenix.org/conference/fast16/technical-sessions/presentation/lee. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, and Sang-Woo Kim. 2008. A case for flash memory ssd in enterprise database applications. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 1075--1086. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hans-Otto Leilich, Günther Stiege, and Hans Christoph Zeidler. 1978. A search processor for data base management systems. In 4th International Conference on Very Large Data Bases. 280--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Pete Wyckoff, and Dhabaleswar K. Panda. 2003. High performance RDMA-based MPI Implementation over InfiniBand. In Proceedings of the 17th Annual International Conference on Supercomputing (ICS’03). ACM, New York, NY, 295--304. DOI:http://dx.doi.org/10.1145/782814.782855 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Frank McSherry, Michael Isard, and Derek G. Murray. 2015. Scalability! but at what cost? In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems (HOTOS’15). USENIX Association, Berkeley, CA. http://dl.acm.org/citation.cfm?id=2831090.2831104 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Violin Memory. 2014 (Accessed November 18, 2014). Violin Memory. http://www.violin-memory.com.Google ScholarGoogle Scholar
  37. James Morris Jr. and Vaughan Pratt. 1970. A Linear Pattern-Matching Algorithm. TR-40, Comptr Ctr., U of California, Berkeley, Calif.Google ScholarGoogle Scholar
  38. Rene Mueller, Jens Teubner, and Gustavo Alonso. 2009. Streams on wires: A query compiler for FPGAs. Proc. VLDB Endow. 2, 1 (Aug. 2009), 229--240. DOI:http://dx.doi.org/10.14778/1687627.1687654 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Oracle. 2014 (Accessed November 18, 2014). Exadata Database Machine. https://www.oracle.com/engineered-systems/exadata/index.html.Google ScholarGoogle Scholar
  40. John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2010. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. SIGOPS Oper. Syst. Rev. 43, 4 (Jan. 2010), 92--105. DOI:http://dx.doi.org/10.1145/1713254.1713276 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Esen A. Ozkarahan, Stewart A. Schuster, and Kenneth C. Smith. 1975. RAP - An associative processor for database management. In American Federation of Information Processing Societies: 1975 National Computer Conference. 379--387. DOI:http://dx.doi.org/10.1145/1499949.1500024 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi, and Xiao Doug Burger. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. SIGARCH Comput. Archit. News 42, 3 (June 2014), 13--24. DOI:http://dx.doi.org/10.1145/2678373.2665678 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Md. W. Rahman, Nusrat S. Islam, Xaoyi Lu, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-performance RDMA-based design of hadoop mapreduce over InfiniBand. In International Workshop on High Performance Data Intensive Computing (HPDIC’13), in Conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Md. W. Rahman, Xiaoyi Lu, Nusrat S. Islam, and Dhabaleswar K. Panda. 2014. HOMR: A hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In Proceedings of the 28th ACM International Conference on Supercomputing (ICS’14). ACM, New York, NY, 33--42. DOI:http://dx.doi.org/10.1145/2597652.2597684 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). USENIX Association, Berkeley, CA, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. SanDisk. 2014 (Accessed November 22, 2014). Sandisk ZetaScale Software. http://www.sandisk.com/enterprise/zetascale/.Google ScholarGoogle Scholar
  47. Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A user-programmable SSD. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, 67--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Malcolm Singh and Ben Leonhardi. 2011. Introduction to the IBM netezza warehouse appliance. In Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research (CASCON’11). IBM Corp., Riverton, NJ, 385--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Joel R. Spiegel, Michael T. McKenna, Girish S. Lakshman, and Paul G. Nordstrom. 2011. Method and system for anticipatory package shipping. (Dec. 27 2011). US Patent 8,086,546.Google ScholarGoogle Scholar
  50. Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, and Sameh Asaad. 2012. Database analytics acceleration using FPGAs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 411--420. DOI:http://dx.doi.org/10.1145/2370816.2370874 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Diablo Technologies. 2014 (Accessed November 18, 2014). Diablo Technologies. http://www.diablo-technologies.com/.Google ScholarGoogle Scholar
  52. Tim S. Woodall, Galen M. Shipman, George Bosilca, Richard L. Graham, and Arthur B. Maccabe. 2006. High performance RDMA protocols in HPC. In Proceedings of the 13th European PVM/MPI User’s Group Conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI’06). Springer-Verlag, Berlin, 76--85. DOI:http://dx.doi.org/10.1007/11846802_18 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Louis Woods, Zsolt Istvan, and Gustavo Alonso. 2013. Hybrid FPGA-accelerated SQL query processing. In 2013 23rd International Conference on Field Programmable Logic and Applications (FPL’13). DOI:http://dx.doi.org/10.1109/FPL.2013.6645619Google ScholarGoogle ScholarCross RefCross Ref
  54. Louis Woods, Zsolt Istvan, and Gustavo Alonso. 2014. Ibex - an intelligent storage engine with support for advanced SQL Off-loading. In Proceedings of the 40th International Conference on Very Large Data Bases (VLDB’14). 963--974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Lisa Wu, Andrea Lottarini, Timothy K. Paine, Martha A. Kim, and Kenneth A. Ross. 2014. Q100: The architecture and design of a database processing unit. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 255--268. DOI:http://dx.doi.org/10.1145/2541940.2541961 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BlueDBM: Distributed Flash Storage for Big Data Analytics

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Computer Systems
            ACM Transactions on Computer Systems  Volume 34, Issue 3
            September 2016
            103 pages
            ISSN:0734-2071
            EISSN:1557-7333
            DOI:10.1145/2966277
            Issue’s Table of Contents

            Copyright © 2016 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 June 2016
            • Accepted: 1 March 2016
            • Received: 1 February 2016
            Published in tocs Volume 34, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader