ABSTRACT
Accelerated Processing Unit (APU) is a heterogeneous multicore processor that contains general-purpose CPU cores and a GPU in a single chip. It also supports Heterogeneous System Architecture (HSA) that provides coherent physically-shared memory between the CPU and the GPU. In this paper, we present the design and implementation of a high-performance IPsec gateway using a low-cost commodity embedded APU. The HSA supported by the APUs eliminates the data copy overhead between the CPU and the GPU, which is unavoidable in the previous discrete GPU approaches. The gateway is implemented in OpenCL to exploit the GPU and uses zero-copy packet I/O APIs in DPDK. The IPsec gateway handles the real-world network traffic where each packet has a different workload. The proposed packet scheduling algorithm significantly improves GPU utilization for such traffic. It works not only for APUs but also for discrete GPUs. With three CPU cores and one GPU in the APU, the IPsec gateway achieves a throughput of 10.36 Gbps with an average latency of 2.79 ms to perform AES-CBC+HMAC-SHA1 for incoming packets of 1024 bytes.
- APUs-Accerlated Processing Units. Website. http://www.amd.com/en-us/innovations/software-technologies/apu/.Google Scholar
- DPDK: Data Plane Development Kit. Website. http://www.dpdk.org.Google Scholar
- Heterogeneous System Architecture. Website. http://www.hsafoundation.com.Google Scholar
- AMD. OpenCL Optimization Guide. Website. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/.Google Scholar
- S. Bradner and J. McQuaid. Benchmarking Methodology for Network Interconnect Devices. RFC 2544 (Informational), Mar. 1999. Updated by RFCs 6201, 6815. Google ScholarDigital Library
- P. Chodowiec, K. Gaj, P. Bellows, and B. Schott. Experimental Testing of the Gigabit IPSec-Compliant Implementations of Rijndael and Triple DES Using SLAAC-1V FPGA Accelerator Board. In Proceedings of the 4th International Conference on Information Security, ISC '01, pages 220--234, 2001. Google ScholarDigital Library
- A. Dandalis and V. K. Prasanna. An Adaptive Cryptographic Engine for Internet Protocol Security Architectures. volume 9, pages 333--353, July 2004. Google ScholarDigital Library
- T. T. Dao, J. Kim, S. Seo, B. Egger, and J. Lee. A Performance Model for GPUs with Caches. Parallel and Distributed Systems, IEEE Transactions on, 26(7):1800--1813, July 2015.Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990. Google ScholarDigital Library
- R. L. Graham. Bounds on Multiprocessing Timing Anomalies. SIAM journal on Applied Mathematics, 17(2):416--429, 1969.Google Scholar
- C.-S. Ha, J. H. Lee, D. S. Leem, M.-S. Park, and B.-Y. Choi. ASIC design of IPSec hardware accelerator for network security. In Proceedings of the 2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, pages 168--171, Aug 2004.Google Scholar
- S. Han, K. Jang, K. Park, and S. Moon. PacketShader: A GPU-accelerated Software Router. In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, pages 195--206, 2010. Google ScholarDigital Library
- Helion Technology Limited. IPsec solutions. Website. http://www.heliontech.com/ipsec.htm.Google Scholar
- A. Hoban. Using Intel AES New Instructions and PCLMULQDQ to Significantly Improve IPSec Performance on Linux. White paper, 2010.Google Scholar
- A. Hodjat, P. Schaumont, and I. Verbauwhede. Architectural Design Features of a Programmable High Throughput AES Coprocessor. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2, ITCC '04, pages 498--, 2004. Google ScholarDigital Library
- Intel Corporation. Integrated Cryptographic and Compression Accelerators on Intel Architecture Platforms. 2013. http://intel.ly/1NF6xFq.Google Scholar
- M. A. Jamshed, J. Lee, S. Moon, I. Yun, D. Kim, S. Lee, Y. Yi, and K. Park. Kargus: A Highly-scalable Software-based Intrusion Detection System. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS '12, pages 317--328, 2012. Google ScholarDigital Library
- K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: Cheap SSL Acceleration with Commodity Processors. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pages 1--14, 2011. Google ScholarDigital Library
- A. P. Kakarountas, H. Michail, A. Milidonis, C. E. Goutis, and G. Theodoridis. High-Speed FPGA Implementation of Secure Hash Algorithm for IPSec and VPN Applications. The Journal of Supercomputing, 37(2):179--195, 2006. Google ScholarDigital Library
- S. Kent and R. Atkinson. Security Architecture for the Internet Protocol. RFC 2401 (Proposed Standard), November 1998. Obsoleted by RFC 4301, updated by RFC 3168. Google ScholarDigital Library
- Khronos Group. OpenCL 2.0 Specification. Khronos Group, November 2013.Google Scholar
- J. Kim, K. Jang, K. Lee, S. Ma, J. Shim, and S. Moon. NBA (Network Balancing Act): A High-performance Packet Processing Framework for Heterogeneous Processors. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 22:1--22:14, 2015. Google ScholarDigital Library
- Y. Li, D. Zhang, A. X. Liu, and J. Zheng. GAMT: A Fast and Scalable IP Lookup Engine for GPU-based Software Routers. In Proceedings of the Ninth ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS '13, pages 1--12, 2013. Google ScholarDigital Library
- Y. Liu, D. Xu, W. Song, and Z. Mu. Design and Implementation of High Performance IPSec Applications with Multi-Core Processors. In Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering, FITME '08, pages 595--598, Nov 2008. Google ScholarDigital Library
- J. Meng, X. Chen, Z. Chen, C. Lin, B. Mu, and L. Ruan. Towards High-performance IPsec on Cavium OCTEON Platform. In Proceedings of the Second International Conference on Trusted Systems, INTRUST'10, pages 37--46, 2011. Google ScholarDigital Library
- S. Mu, X. Zhang, N. Zhang, J. Lu, Y. S. Deng, and S. Zhang. IP Routing Processing with Graphic Processors. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '10, pages 93--98, Leuven, Belgium, 2010. European Design and Automation Association. Google ScholarDigital Library
- NVIDIA. CUDA C Programming Guide. NVIDIA, July 2013.Google Scholar
- J. Thoguluva, A. Raghunathan, and S. T. Chakradhar. Efficient Software Architecture for IPSec Acceleration Using a Programmable Security Processor. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '08, pages 1148--1153, 2008. Google ScholarDigital Library
- G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High Performance Network Intrusion Detection Using Graphics Processors. In Proceedings of the 11th International Symposium on Recent Advances in Intrusion Detection, RAID '08, pages 116--134, 2008. Google ScholarDigital Library
- G. Vasiliadis, L. Koromilas, M. Polychronakis, and S. Ioannidis. GASPP: A GPU-accelerated Stateful Packet Processing Framework. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 321--332, 2014. Google ScholarDigital Library
- G. Vasiliadis, M. Polychronakis, and S. Ioannidis. MIDeA: A Multi-parallel Intrusion Detection Architecture. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS '11, pages 297--308, 2011. Google ScholarDigital Library
- E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly Elimination of Dynamic Irregularities for GPU Computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 369--380, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Zhao, X. Zhang, X. Wang, and X. Xue. Achieving O(1) IP Lookup on GPU-based Software Routers. In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, pages 429--430, 2010. Google ScholarDigital Library
Index Terms
- PIPSEA: A Practical IPsec Gateway on Embedded APUs
Recommendations
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
hiCL: an OpenCL abstraction layer for scientific computing, application to depth imaging on GPU and APU
IWOCL '16: Proceedings of the 4th International Workshop on OpenCLHardware accelerators (HWAs), such as Graphics Processing Units (GPUs) have proven their potential to boost scientific applications performance and have been widely embraced by academia and industry. The OpenCL programming model ensures portability on ...
On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms
ICPADS '13: Proceedings of the 2013 International Conference on Parallel and Distributed SystemsThe proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and ...
Comments