FPGA vs. Multi-core CPUs vs. GPUs: Hands-On Experience with a Sorting Application

Grozea, Cristian; Bankovic, Zorana; Laskov, Pavel

doi:10.1007/978-3-642-16233-6_12

Cristian Grozea¹⁹,
Zorana Bankovic²⁰ &
Pavel Laskov²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6310))

1651 Accesses
11 Citations

Abstract

Currently there are several interesting alternatives for low-cost high-performance computing. We report here our experiences with an N-gram extraction and sorting problem, originated in the design of a real-time network intrusion detection system. We have considered FPGAs, multi-core CPUs in symmetric multi-CPU machines and GPUs and have created implementations for each of these platforms. After carefully comparing the advantages and disadvantages of each we have decided to go forward with the implementation written for multi-core CPUs. Arguments for and against each platform are presented – corresponding to our hands-on experience – that we intend to be useful in helping with the selection of the hardware acceleration solutions for new projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Thrust, http://code.google.com/thrust
Xilinx FDSE, http://www.xilinx.com/itp/xilinx7/books/data/docs/s3esc/s3esc0081_72.html
Project ReMIND (2007), http://www.remind-ids.org
Xilinx application note XAPP1052, v1.1 (2008), http://www.xilinx.com/support/documentation/application_notes/xapp1052.pdf
Batcher, K.E.: Sorting networks and their applications. In: Proceedings of the Spring Joint Computer Conference, April 30-May 2, pp. 307–314. ACM, New York (1968)
Google Scholar
Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J., Storaasli, O.O.: State-of-the-Art In Heterogeneous Computing. Journal of Scientific Programming (draft, accepted for publication)
Google Scholar
Chamberlain, R.D., Ganesan, N.: Sorting on architecturally diverse computer systems. In: Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications, pp. 39–46. ACM, New York (2009)
Google Scholar
Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with gpus and fpgas. In: Symposium on Application Specific Processors (2008)
Google Scholar
Dagum, L., Menon, R.: Open MP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering 5(1), 46–55 (1998)
Article Google Scholar
Dongarra, J., Gannon, D., Fox, G., Kennedy, K.: The impact of multicore on computational science software. CTWatch Quarterly (February 2007)
Google Scholar
Grozea, C., Gehl, C., Popescu, M.: ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection. In: 3rd Pan Workshop. Uncovering Plagiarism, Authorship And Social Software Misuse, p. 10
Google Scholar
Harkins, J., El-Ghazawi, T., El-Araby, E., Huang, M.: Performance of sorting algorithms on the SRC 6 reconfigurable computer. In: Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology, pp. 295–296 (2005)
Google Scholar
Hofstee, H.P.: Power efficient processor architecture and the Cell processor. In: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, San Francisco, CA, pp. 258–262 (2005)
Google Scholar
Hou, Q., Zhou, K., Guo, B.: BSGP: bulk-synchronous GPU programming. In: ACM SIGGRAPH 2008 papers, p. 19. ACM, New York (2008)
Google Scholar
Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., Fasih, A., Sarma, A.D., Nanongkai, D., Pandurangan, G., Tetali, P., et al.: PyCUDA: GPU Run-Time Code Generation for High-Performance Computing. Arxiv preprint arXiv:0911.3456 (2009)
Google Scholar
Korrenek, J., Sekanina, L.: Intrinsic evolution of sorting networks: A novel complete hardware implementation for FPGAs. LNCS, pp. 46–55. Springer, Heidelberg
Google Scholar
Koza, J.R., Bennett III, F.H., Hutchings, J.L., Bade, S.L., Keane, M.A., Andre, D.: Evolving sorting networks using genetic programming and the rapidlyreconfigurable Xilinx 6216 field-programmable gate array. In: Conference Record of the Thirty-First Asilomar Conference on Signals, Systems & Computers, vol. 1 (1997)
Google Scholar
Krueger, T., Gehl, C., Rieck, K., Laskov, P.: An Architecture for Inline Anomaly Detection. In: Proceedings of the 2008 European Conference on Computer Network Defense, pp. 11–18. IEEE Computer Society, Los Alamitos (2008)
Chapter Google Scholar
Leischner, N., Osipov, V., Sanders, P.: GPU sample sort. Arxiv preprint arXiv:0909.5649 (2009)
Google Scholar
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 39–55 (2008)
Google Scholar
Martinez, J., Cumplido, R., Feregrino, C.: An FPGA-based parallel sorting architecture for the Burrows Wheeler transform. In: ReConFig 2005. International Conference on Reconfigurable Computing and FPGAs, p. 7 (2005)
Google Scholar
Muller, M.S., Knupfer, A., Jurenz, M., Lieber, M., Brunst, H., Mix, H., Nagel, W.E.: Developing Scalable Applications with Vampir, VampirServer and VampirTrace. In: Proceedings of the Minisymposium on Scalability and Usability of HPC Programming Tools at PARCO (2007) (to appear)
Google Scholar
Munshi, A.: The OpenCL specification version 1.0. Khronos OpenCL Working Group (2009)
Google Scholar
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA (2008)
Google Scholar
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings-IEEE 96(5), 879 (2008)
Article Google Scholar
Rieck, K., Laskov, P.: Language models for detection of unknown attacks in network traffic. Journal in Computer Virology 2(4), 243–256 (2007)
Article Google Scholar
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, pp. 1–10. IEEE Computer Society, Los Alamitos (2009)
Chapter Google Scholar
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, p. 106. Eurographics Association (2007)
Google Scholar
Smith, M.C., Vetter, J.S., Alam, S.R.: Scientific computing beyond CPUs: FPGA implementations of common scientific kernels. In: Proceedings of the 8th International Conference on Military and Aerospace Programmable Logic Devices, MAPLD 2005, Citeseer (2005)
Google Scholar
Stone, H.S.: Parallel processing with the perfect shuffle. IEEE Transactions on Computers 100(20), 153–161 (1971)
Article MATH Google Scholar
Thomas, D.B., Howes, L., Luk, W.: A comparison of CPUs, GPUs, FPGAs, and massively parallel processor arrays for random number generation. In: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 63–72. ACM, New York (2009)
Chapter Google Scholar
Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The potential of the cell processor for scientific computing. In: Proceedings of the 3rd Conference on Computing Frontiers, pp. 9–20. ACM, New York (2006)
Google Scholar
Wu, Y.L., Chang, D.: On the NP-completeness of regular 2-D FPGA routing architectures and a novel solution. In: Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design, pp. 362–366. IEEE Computer Society Press, Los Alamitos (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Fraunhofer Institute FIRST, Kekulestrasse 7, 12489, Berlin, Germany
Cristian Grozea
ETSI Telecomunicación, Technical University of Madrid, Av. Complutense 30, 28040, Madrid, Spain
Zorana Bankovic
Wilhelm Schickard Institute for Computer Science, University of Tuebingen, Sand 1, 72076, Tuebingen, Germany
Pavel Laskov

Authors

Cristian Grozea
View author publications
You can also search for this author in PubMed Google Scholar
Zorana Bankovic
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Laskov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Universität Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Rainer Keller
Institute of Computer Science and Engineering, Karlsruhe Institute of Technology, Haid-und-Neu-Str. 7, 76131, Karlsruhe, Germany
David Kramer
Engineering Mathematics and Computing Lab (EMCL) & Institute for Applied and Numerical Mathematics 4, Karlsruhe Institute of Technology, Fritz-Erler-Str. 23, 76133, Karlsruhe, Germany
Jan-Philipp Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grozea, C., Bankovic, Z., Laskov, P. (2010). FPGA vs. Multi-core CPUs vs. GPUs: Hands-On Experience with a Sorting Application. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore-Challenge. Lecture Notes in Computer Science, vol 6310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16233-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-16233-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16232-9
Online ISBN: 978-3-642-16233-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics