research-article

Pipelined Parallel Join and Its FPGA-Based Acceleration

Authors:
Masato Yoshimi

University of Electro-Communications, Chofu, TOKYO

University of Electro-Communications, Chofu, TOKYO
View Profile

,
Yasin Oge

University of Electro-Communications, Chofu, TOKYO

University of Electro-Communications, Chofu, TOKYO
View Profile

,
Tsutomu Yoshinaga

University of Electro-Communications, Chofu, TOKYO

University of Electro-Communications, Chofu, TOKYO
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 10 Issue 4Article No.: 28pp 1–28https://doi.org/10.1145/3079759

Published:27 December 2017Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

A huge amount of data is being generated and accumulated in data centers, which leads to an important increase in the required energy consumption to analyze these data. Thus, we must consider the redesign of current computer systems architectures to be more friendly to applications based on distributed algorithms that require a high data transfer rate.

Novel computer architectures that introduce dedicated accelerators to enable near-data processing have been discussed and developed for high-speed big-data analysis. In this work, we propose a computer system with an FPGA-based accelerator, namely, interconnected-FPGAs, which offers two advantages: (1) direct data transmission and (2) offloading computation into data-flow in the FPGA. In this article, we demonstrate the capability of the proposed interconnected-FPGAs system to accelerate join operations in a relational database. We developed a new parallel join algorithm, PPJoin, targeted to big-data analysis in a shared-nothing architecture. PPJoin is an extended version of the NUMA-based parallel join algorithm, created by overlapping computation by multicore processors and data communication. The data communication between computational nodes can be accelerated by direct data transmission without passing through the main memory of the hosts. To confirm the performance of the PPJoin algorithm and its acceleration process using an interconnected-FPGA platform, we evaluated a simple query for large tables. Additionally, to support availability, we also evaluated the actual benchmark query. Our evaluation results confirm that the PPJoin algorithm is faster than a software-based query engine by 1.5--5 times. Moreover, we experimentally confirmed that the direct data transmission by interconnected FPGAs reduces computational time around 20% for PPJoin.

References

Selim G. Akl. 1990. Parallel Sorting Algorithms. Academic Press, Inc. Google ScholarDigital Library
Martina-Cezara Albutiu, Alfons Kemper, and Thomas Neumann. 2012. Massively parallel sort-merge joins in main memory multi-core database systems. Proceedings of the VLDB Endowment 5, 10 (Jun. 2012), 1064--1075. Google ScholarDigital Library
Krste Asanović and David Patterson. 2014. FireBox: A hardware building block for 2020 warehouse-scale computers. Oral. In Proceedings of the 12th USENIX Conference on File and Storage Technologies, Bianca Schroeder and Eno Thereska (Eds.). USENIX Association, Santa Clara, CA, 1--46. https://www.usenix.org/conference/fast14/technical-sessions/presentation/keynote.Google Scholar
AVAL DATA. 2011. APX880. https://www.avaldata.co.jp/english_08/products/giga/tera_storage/apx880.html. (2011).Google Scholar
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-core, main-memory joins: Sort vs. hash revisited. Proceedings of the VLDB Endowment 7, 1 (Sep. 2013), 85--96. Google ScholarDigital Library
C. L. Philip Chen and Chun-Yang Zhang. 2014. Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275 (2014), 314--347.Google ScholarCross Ref
Sangyeun Cho, Chanik Park, Hyunok Oh, Sungchan Kim, Youngmin Yi, and Gregory R. Ganger. 2013. Active disk meets flash: A case for intelligent SSDs. In Proceedings of the 27th International Conference on Supercomputing, Allen D. Malony (Ed.). ACM, Eugene, Oregon, USA, 91--102. Google ScholarDigital Library
Cloudera. 2012. Impara. Retrieved from http://impala.io/.Google Scholar
George S. Davidson, Kevin W. Boyack, Ron A. Zacharski, Stephen C. Helmreich, and Jim R. Cowie. 2006. Data-centric computing with the netezza architecture. Sandia Report (Apr. 2006), 1--24.Google Scholar
Arup De, Maya Gokhale, Rajesh Gupta, and Steven Swanson. 2013. Minerva: Accelerating data analysis in next-generation SSDs. In Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, Ken Eguro and Miriam Leeser (Eds.). IEEE Computer Society, Seattle, WA, USA, 9--16. Google ScholarDigital Library
J. A. Delmerico, N. A. Byrnes, A. E. Bruno, M. D. Jones, S. M. Gallo, and V. Chaudhary. 2009. Comparing the performance of clusters, hadoop, and active disks on microarray correlation computations. In Proceedings of 2009 International Conference on High Performance Computing, Yuanyuan Yang, Manish Parashar, Rajeev Muralidhar, and Viktor K. Prasanna (Eds.). IEEE, Kochi, INDIA, 378--387.Google Scholar
Christopher Dennl, Daniel Ziener, and Jürgen Teich. 2013. Acceleration of SQL restrictions and aggregations through FPGA-based dynamic partial reconfiguration. In Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, Ken Eguro and Miriam Leeser (Eds.). IEEE Computer Society, Seattle, WA, USA, 25--28. Google ScholarDigital Library
Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom. 2008. Database Systems: The Complete Book (2nd ed.). Pearson. Google ScholarDigital Library
Robert J. Halstead, Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Sameh Asaad, and Balakrishna Iyer. 2013. Accelerating join operation for relational databases with FPGAs. In Proceedings of the 2013 21th International Symposium on Field-Programmable Custom Computing Machines, Ken Eguro and Miriam Leeser (Eds.). IEEE Computer Society, Seattle, WA, USA, 17--20. Google ScholarDigital Library
Bingsheng He, Wenbin Fang, Qiong Luo, Naga K. Govindaraju, and Tuyong Wang. 2008a. Mars: A MapReduce framework on graphics processors. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, Andreas Moshovos (Ed.). ACM New York, NY, USA, Toronto, Ontario, Canada, 260--269. Google ScholarDigital Library
Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008b. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Laks V. S. Lakshmanan, Raymond T. Ng, and Dennis Shasha (Eds.). ACM New York, NY, USA, Vancouver, Canada, 511--524. Google ScholarDigital Library
Toshimori Honjo and Kazuki Oikawa. 2013. Hardware acceleration of hadoop MapReduce. In Proceedings of 2013 IEEE International Conference on Big Data, Xiaohua Hu et al. (Eds.). IEEE, Silicon Valley, CA, USA, 118--124.Google ScholarCross Ref
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Annkcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An appliance for big data analytics. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’15), Debbie Marr (Ed.). ACM New York, NY, USA, Portland, Oregon, 1--13. Google ScholarDigital Library
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. In Proceedings of the VLDB Endowment, Vol. 2. 1378--1389. Google ScholarDigital Library
Sungchan Kim, Hyunok Oh, Chanik Park, Sangyeun Cho, and Sang-Won Lee. 2011. Fast, energy efficient scan inside flash memory SSDs. In Proceedings of the 2nd International Workshop on Accelerating Data Management Systems (ADMS), Rajesh Bordawekar and Christian A. Lang (Eds.). VLDB, Seattle, WA, USA, 1--8.Google Scholar
Jay Kyathsandra and Eric Dahlen. 2013. Intel Rack Scale Architecture Overview. INTEROP. http://presentations.interop.com/events/las-vegas/2013/free-sessions—keynote-presentations/download/463.Google Scholar
Open Compute Project. 2011. The Open Compute Project. Retrieved from http://www.opencompute.org/.Google Scholar
Oracle. 2016. MySQL 5.7 Reference Manual, Chapter 19 MySQL Cluster NDB 7.5. Retrieved from https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster.html.Google Scholar
Andrew Putnam, Adrian Caulfield, Eric Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, Jim Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A reconfiggurable fabric for accelerating large-scale datacenter services. In Proceedings of the 41st Annual International Symposium on Computer Architecture (ISCA), Pen-Chung Yew and Antonia Zhai (Eds.). IEEE Press Piscataway, NJ, USA, Minneapolis, Minnesota, USA, 13--24. Google ScholarDigital Library
M. M. Rafique, B. Rose, A. R. Butt, and D. S. Nikolopoulos. 2009. CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters. In Proceedings of IEEE International Symposium on Parallel Distributed Processing, Alessandro Mei (Ed.). IEEE, Rome, Italy, 1--12. Google ScholarDigital Library
Abraham Silberschatz, Henry F. Korth, and S. Sudarshan. 2010. Database Systems Concepts (6 ed.). McGraw-Hill, Inc.Google Scholar
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. In Proceedings of the VLDB Endowment 2, 2 (2009), 1626--1629. Google ScholarDigital Library
Christian Tinnefeld, Donald Kossmann, Joos-Hendrik Boese, and Hasso Plattner. 2014. Parallel join executions in RAMCloud. In Proceedings of 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), Philip Yu (Ed.). IEEE, Chicago, IL, USA, 182--190.Google ScholarCross Ref
Transaction Processing Performance Council. 2014. TPC-H Revision 2.17.1. Retrieved from http://www.tpc.org/tpch/.Google Scholar
Meg Whitman and Martin Fink. 2014. HP Labs: The future technology. HP Discover Las Vegas. http://www.hpe.com/TheMachine.Google Scholar
Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex -- An intelligent storage engine with support for advanced SQL off-loading. In Proceedings of the VLDB Endowment 7, 11 (2014), 963--974. Google ScholarDigital Library
Lisa Wu, Raymond J. Barker, Martha A. Kim, and Kenneth A. Ross. 2013. Navigating big data with high-throughput, energy-efficient data partitioning. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13), Margaret Martonosi (Ed.). ACM New York, NY, USA, Tel-Aviv, Israel, 249--260. Google ScholarDigital Library
Masato Yoshimi, Ryu Kudo, Yasin Oge, Yuta Terada, Hidetsugu Irie, and Tsutomu Yoshinaga. 2014b. Accelerating OLAP workload on interconnected FPGAs with flash storage. In Proceedings of 2nd International Workshop on Computer Systems and Architectures (CSA’14), Shuichi Ichikawa (Ed.). IEEE, Shizuoka, Japan, 1--7. Google ScholarDigital Library
Masato Yoshimi, Ryu Kudo, Yasin Oge, Yuta Terada, Hidetsugu Irie, and Tsutomu Yoshinaga. 2014a. An FPGA-based tightly coupled accelerator for data-intensive applications. In Proceedings of the IEEE 8th International Symposium on Embedded Multicore/Many-Core SoCs, Toshiyaki Miyazaki (Ed.). IEEE. Aizu-Wakamatsu, Japan. Google ScholarDigital Library

Index Terms

Pipelined Parallel Join and Its FPGA-Based Acceleration
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

Acceleration of an FPGA router
FCCM '97: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines

The authors describe their experience and progress in accelerating an FPGA router. Placement and routing is undoubtedly the most time-consuming process in automatic chip design or configuring programmable logic devices as reconfigurable computing ...
Read More
Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Field programmable gate arrays (FPGA) are extensively used for rapid prototyping in embedded system applications. While hardware acceleration can be done via specialized processors like a Graphical Processing Unit (GPU), they can also be accomplished ...
Read More
FPGA-based fine-grain parallel computing (abstract only)
FPGA '11: Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays

FPGAs are increasing in computing power at a significant rate while the non-recurring engineering costs and time-to-market remain significant lower than those for application-specific integrated circuits (ASICs), encouraging FPGAs to be used in areas ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 10, Issue 4
December 2017
119 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3166118
Editor:
Steve Wilton
Department of Electrical and Computer Engineering / University of British Columbia / Kaiser 4112, 5500-2332 Main Mall / Vancouver, BC V6T 1Z4 Canada
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 December 2017
- Accepted: 1 April 2017
- Revised: 1 December 2016
- Received: 1 September 2015
Published in trets Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Field Programmable Gate Arrays
In-datapath computing
databases
join operation
parallel computing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 396
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Pipelined Parallel Join and Its FPGA-Based Acceleration

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Acceleration of an FPGA router

Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors (abstract only)

FPGA-based fine-grain parallel computing (abstract only)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Pipelined Parallel Join and Its FPGA-Based Acceleration

ACM Transactions on Reconfigurable Technology and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Acceleration of an FPGA router

Hardware acceleration of TEA and XTEA algorithms on FPGA, GPU and multi-core processors (abstract only)

FPGA-based fine-grain parallel computing (abstract only)

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media