skip to main content
research-article

Pipelined Parallel Join and Its FPGA-Based Acceleration

Published:27 December 2017Publication History
Skip Abstract Section

Abstract

A huge amount of data is being generated and accumulated in data centers, which leads to an important increase in the required energy consumption to analyze these data. Thus, we must consider the redesign of current computer systems architectures to be more friendly to applications based on distributed algorithms that require a high data transfer rate.

Novel computer architectures that introduce dedicated accelerators to enable near-data processing have been discussed and developed for high-speed big-data analysis. In this work, we propose a computer system with an FPGA-based accelerator, namely, interconnected-FPGAs, which offers two advantages: (1) direct data transmission and (2) offloading computation into data-flow in the FPGA. In this article, we demonstrate the capability of the proposed interconnected-FPGAs system to accelerate join operations in a relational database. We developed a new parallel join algorithm, PPJoin, targeted to big-data analysis in a shared-nothing architecture. PPJoin is an extended version of the NUMA-based parallel join algorithm, created by overlapping computation by multicore processors and data communication. The data communication between computational nodes can be accelerated by direct data transmission without passing through the main memory of the hosts. To confirm the performance of the PPJoin algorithm and its acceleration process using an interconnected-FPGA platform, we evaluated a simple query for large tables. Additionally, to support availability, we also evaluated the actual benchmark query. Our evaluation results confirm that the PPJoin algorithm is faster than a software-based query engine by 1.5--5 times. Moreover, we experimentally confirmed that the direct data transmission by interconnected FPGAs reduces computational time around 20% for PPJoin.

References

  1. Selim G. Akl. 1990. Parallel Sorting Algorithms. Academic Press, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Martina-Cezara Albutiu, Alfons Kemper, and Thomas Neumann. 2012. Massively parallel sort-merge joins in main memory multi-core database systems. Proceedings of the VLDB Endowment 5, 10 (Jun. 2012), 1064--1075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Krste Asanović and David Patterson. 2014. FireBox: A hardware building block for 2020 warehouse-scale computers. Oral. In Proceedings of the 12th USENIX Conference on File and Storage Technologies, Bianca Schroeder and Eno Thereska (Eds.). USENIX Association, Santa Clara, CA, 1--46. https://www.usenix.org/conference/fast14/technical-sessions/presentation/keynote.Google ScholarGoogle Scholar
  4. AVAL DATA. 2011. APX880. https://www.avaldata.co.jp/english_08/products/giga/tera_storage/apx880.html. (2011).Google ScholarGoogle Scholar
  5. Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, main-memory joins: Sort vs. hash revisited. Proceedings of the VLDB Endowment 7, 1 (Sep. 2013), 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. L. Philip Chen and Chun-Yang Zhang. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275 (2014), 314--347.Google ScholarGoogle ScholarCross RefCross Ref
  7. Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active disk meets flash: A case for intelligent SSDs. In Proceedings of the 27th International Conference on Supercomputing, Allen D. Malony (Ed.). ACM, Eugene, Oregon, USA, 91--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cloudera. 2012. Impara. Retrieved from http://impala.io/.Google ScholarGoogle Scholar
  9. George S. Davidson, Kevin W. Boyack, Ron A. Zacharski, Stephen C. Helmreich, and Jim R. Cowie. 2006. Data-centric computing with the netezza architecture. Sandia Report (Apr. 2006), 1--24.Google ScholarGoogle Scholar
  10. Arup De, Maya Gokhale, Rajesh Gupta, and Steven Swanson. 2013. Minerva: Accelerating data analysis in next-generation SSDs. In Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, Ken Eguro and Miriam Leeser (Eds.). IEEE Computer Society, Seattle, WA, USA, 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. A. Delmerico, N. A. Byrnes, A. E. Bruno, M. D. Jones, S. M. Gallo, and V. Chaudhary. 2009. Comparing the performance of clusters, hadoop, and active disks on microarray correlation computations. In Proceedings of 2009 International Conference on High Performance Computing, Yuanyuan Yang, Manish Parashar, Rajeev Muralidhar, and Viktor K. Prasanna (Eds.). IEEE, Kochi, INDIA, 378--387.Google ScholarGoogle Scholar
  12. Christopher Dennl, Daniel Ziener, and Jürgen Teich. 2013. Acceleration of SQL restrictions and aggregations through FPGA-based dynamic partial reconfiguration. In Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, Ken Eguro and Miriam Leeser (Eds.). IEEE Computer Society, Seattle, WA, USA, 25--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom. 2008. Database Systems: The Complete Book (2nd ed.). Pearson. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Robert J. Halstead, Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Sameh Asaad, and Balakrishna Iyer. 2013. Accelerating join operation for relational databases with FPGAs. In Proceedings of the 2013 21th International Symposium on Field-Programmable Custom Computing Machines, Ken Eguro and Miriam Leeser (Eds.). IEEE Computer Society, Seattle, WA, USA, 17--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, and Tuyong Wang. 2008a. Mars: A MapReduce framework on graphics processors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Andreas Moshovos (Ed.). ACM New York, NY, USA, Toronto, Ontario, Canada, 260--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008b. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Laks V. S. Lakshmanan, Raymond T. Ng, and Dennis Shasha (Eds.). ACM New York, NY, USA, Vancouver, Canada, 511--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Toshimori Honjo and Kazuki Oikawa. 2013. Hardware acceleration of hadoop MapReduce. In Proceedings of 2013 IEEE International Conference on Big Data, Xiaohua Hu et al. (Eds.). IEEE, Silicon Valley, CA, USA, 118--124.Google ScholarGoogle ScholarCross RefCross Ref
  18. Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Annkcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An appliance for big data analytics. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’15), Debbie Marr (Ed.). ACM New York, NY, USA, Portland, Oregon, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. In Proceedings of the VLDB Endowment, Vol. 2. 1378--1389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sungchan Kim, Hyunok Oh, Chanik Park, Sangyeun Cho, and Sang-Won Lee. 2011. Fast, energy efficient scan inside flash memory SSDs. In Proceedings of the 2nd International Workshop on Accelerating Data Management Systems (ADMS), Rajesh Bordawekar and Christian A. Lang (Eds.). VLDB, Seattle, WA, USA, 1--8.Google ScholarGoogle Scholar
  21. Jay Kyathsandra and Eric Dahlen. 2013. Intel Rack Scale Architecture Overview. INTEROP. http://presentations.interop.com/events/las-vegas/2013/free-sessions—keynote-presentations/download/463.Google ScholarGoogle Scholar
  22. Open Compute Project. 2011. The Open Compute Project. Retrieved from http://www.opencompute.org/.Google ScholarGoogle Scholar
  23. Oracle. 2016. MySQL 5.7 Reference Manual, Chapter 19 MySQL Cluster NDB 7.5. Retrieved from https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster.html.Google ScholarGoogle Scholar
  24. Andrew Putnam, Adrian Caulfield, Eric Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, Jim Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A reconfiggurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA), Pen-Chung Yew and Antonia Zhai (Eds.). IEEE Press Piscataway, NJ, USA, Minneapolis, Minnesota, USA, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. M. Rafique, B. Rose, A. R. Butt, and D. S. Nikolopoulos. 2009. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Proceedings of IEEE International Symposium on Parallel Distributed Processing, Alessandro Mei (Ed.). IEEE, Rome, Italy, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Abraham Silberschatz, Henry F. Korth, and S. Sudarshan. 2010. Database Systems Concepts (6 ed.). McGraw-Hill, Inc.Google ScholarGoogle Scholar
  27. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. In Proceedings of the VLDB Endowment 2, 2 (2009), 1626--1629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christian Tinnefeld, Donald Kossmann, Joos-Hendrik Boese, and Hasso Plattner. 2014. Parallel join executions in RAMCloud. In Proceedings of 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), Philip Yu (Ed.). IEEE, Chicago, IL, USA, 182--190.Google ScholarGoogle ScholarCross RefCross Ref
  29. Transaction Processing Performance Council. 2014. TPC-H Revision 2.17.1. Retrieved from http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  30. Meg Whitman and Martin Fink. 2014. HP Labs: The future technology. HP Discover Las Vegas. http://www.hpe.com/TheMachine.Google ScholarGoogle Scholar
  31. Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex -- An intelligent storage engine with support for advanced SQL off-loading. In Proceedings of the VLDB Endowment 7, 11 (2014), 963--974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. 2013. Navigating big data with high-throughput, energy-efficient data partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13), Margaret Martonosi (Ed.). ACM New York, NY, USA, Tel-Aviv, Israel, 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Masato Yoshimi, Ryu Kudo, Yasin Oge, Yuta Terada, Hidetsugu Irie, and Tsutomu Yoshinaga. 2014b. Accelerating OLAP workload on interconnected FPGAs with flash storage. In Proceedings of 2nd International Workshop on Computer Systems and Architectures (CSA’14), Shuichi Ichikawa (Ed.). IEEE, Shizuoka, Japan, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Masato Yoshimi, Ryu Kudo, Yasin Oge, Yuta Terada, Hidetsugu Irie, and Tsutomu Yoshinaga. 2014a. An FPGA-based tightly coupled accelerator for data-intensive applications. In Proceedings of the IEEE 8th International Symposium on Embedded Multicore/Many-Core SoCs, Toshiyaki Miyazaki (Ed.). IEEE. Aizu-Wakamatsu, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pipelined Parallel Join and Its FPGA-Based Acceleration

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Reconfigurable Technology and Systems
            ACM Transactions on Reconfigurable Technology and Systems  Volume 10, Issue 4
            December 2017
            119 pages
            ISSN:1936-7406
            EISSN:1936-7414
            DOI:10.1145/3166118
            • Editor:
            • Steve Wilton
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 December 2017
            • Accepted: 1 April 2017
            • Revised: 1 December 2016
            • Received: 1 September 2015
            Published in trets Volume 10, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader