skip to main content
10.1145/3411029.3411033acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Leveraging SIMD parallelism for accelerating network applications

Published: 11 August 2020 Publication History

Abstract

Software packet processing frameworks act as critical components in modern network architecture, as their performance has a vital impact on the quality of the network services. Motivated by the increasing number and capability for advanced vector instructions in recent mainstream CPUs, this paper explores a new parallel processing design and implementation of data structures and algorithms that are frequently used for building network applications. In particular, we propose effective SIMD optimization techniques for the bloom filter and Open vSwitch megaflow cache. Our design reduces memory access latency via careful prefetching and a new design that meets the needs of fast data consuming instructions. Our evaluation shows performance improvements up to 162% in bloom filter and 48% in Open vSwitch compared to their scalar version.

References

[1]
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M Tamer Özsu. 2013. Multi-core, main-memory joins: Sort vs. hash revisited. Proceedings of the VLDB Endowment 7, 1 (2013), 85–96.
[2]
Jatin Chhugani, Anthony D Nguyen, Victor W Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. 2008. Efficient implementation of sorting on multi-core SIMD CPU architecture. Proceedings of the VLDB Endowment 1, 2 (2008), 1313–1324.
[3]
Byungkwon Choi, Jongwook Chae, Muhammad Jamshed, Kyoungsoo Park, and Dongsu Han. 2016. {DFC}: Accelerating String Pattern Matching for Network Applications. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 551–565.
[4]
Data Plane Development Kit [n.d.]. Data Plane Development Kit. https://www.dpdk.org/.
[5]
Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, and John Lockwood. 2003. Deep packet inspection using parallel bloom filters. In 11th Symposium on High Performance Interconnects, 2003. Proceedings. IEEE, 44–51.
[6]
Keith Diefendorff, Pradeep K Dubey, Ron Hochsprung, and HASH Scale. 2000. Altivec extension to PowerPC accelerates media processing. IEEE Micro 20, 2 (2000), 85–95.
[7]
Paul Emmerich, Sebastian Gallenmüller, Daniel Raumer, Florian Wohlfart, and Georg Carle. 2015. Moongen: A scriptable high-speed packet generator. In Proceedings of the 2015 Internet Measurement Conference. ACM, 275–287.
[8]
Shahabeddin Geravand and Mahmood Ahmadi. 2013. Bloom filter applications in network security: A state-of-the-art survey. Computer Networks 57, 18 (2013), 4047–4064.
[9]
Younghwan Go, Muhammad Asim Jamshed, YoungGyoun Moon, Changho Hwang, and KyoungSoo Park. 2017. APUNet: Revitalizing {GPU} as Packet Processing Accelerator. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 83–96.
[10]
Sangjin Han, Keon Jang, KyoungSoo Park, and Sue Moon. 2011. PacketShader: a GPU-accelerated software router. ACM SIGCOMM Computer Communication Review 41, 4 (2011), 195–206.
[11]
Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio Nakatani. 2007. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007). IEEE, 189–198.
[12]
Intel 64 and IA-32 Architectures Software Developer’s Manual [n.d.]. Intel 64 and IA-32 Architectures Software Developer’s Manual. https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf.
[13]
Intel Intrinsics Guide [n.d.]. Intel Intrinsics Guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide.
[14]
Keon Jang, Sangjin Han, Seungyeop Han, Sue B Moon, and KyoungSoo Park. 2011. SSLShader: Cheap SSL Acceleration with Commodity Processors. In NSDI. 1–14.
[15]
Anuj Kalia, Dong Zhou, Michael Kaminsky, and David G Andersen. 2015. Raising the bar for using GPUs in software packet processing. In 12th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 15). 409–423.
[16]
Joongi Kim, Keon Jang, Keunhong Lee, Sangwook Ma, Junhyun Shim, and Sue Moon. 2015. NBA (network balancing act): A high-performance packet processing framework for heterogeneous processors. In Proceedings of the Tenth European Conference on Computer Systems. ACM, 22.
[17]
Harald Lang, Linnea Passing, Andreas Kipf, Peter Boncz, Thomas Neumann, and Alfons Kemper. 2020. Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines. The VLDB Journal 29, 2 (2020), 757–774.
[18]
Samuel Larsen, Rodric Rabbah, and Saman Amarasinghe. 2005. Exploiting vector parallelism in software pipelined loops. In 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’05). IEEE, 11–pp.
[19]
Daniel Lustig and Margaret Martonosi. 2013. Reducing GPU offload latency via fine-grained CPU-GPU synchronization. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 354–365.
[20]
Gaurav Mitra, Beau Johnston, Alistair P Rendell, Eric McCreath, and Jun Zhou. 2013. Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms. In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. IEEE, 1107–1116.
[21]
Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, 2015. The design and implementation of open vswitch. In 12th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 15). 117–130.
[22]
Orestis Polychroniou, Arun Raghavan, and Kenneth A Ross. 2015. Rethinking SIMD vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1493–1508.
[23]
Orestis Polychroniou and Kenneth A Ross. 2014. Vectorized Bloom filters for advanced SIMD processors. In Proceedings of the Tenth International Workshop on Data Management on New Hardware. ACM, 6.
[24]
Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, and John Lockwood. 2005. Fast hash table lookup using extended bloom filter: an aid to network processing. In ACM SIGCOMM Computer Communication Review, Vol. 35. ACM, 181–192.
[25]
Venkatachary Srinivasan, Subhash Suri, and George Varghese. 1999. Packet classification using tuple space search. In ACM SIGCOMM Computer Communication Review, Vol. 29. ACM, 135–146.
[26]
Giorgos Vasiliadis, Lazaros Koromilas, Michalis Polychronakis, and Sotiris Ioannidis. 2014. {GASPP}: A GPU-Accelerated Stateful Packet Processing Framework. In 2014 {USENIX} Annual Technical Conference ({USENIX}{ATC} 14). 321–332.
[27]
VPP [n.d.]. The Vector Packet Processor (VPP). https://fd.io/docs/vpp/master/whatisvpp/index.html.
[28]
Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. Proceedings of the VLDB Endowment 2, 1 (2009), 385–394.
[29]
Jingren Zhou and Kenneth A Ross. 2002. Implementing database operations using SIMD instructions. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. ACM, 145–156.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
APNet '20: Proceedings of the 4th Asia-Pacific Workshop on Networking
August 2020
57 pages
ISBN:9781450388764
DOI:10.1145/3411029
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

APNet '20
APNet '20: 4th Asia-Pacific Workshop on Networking
August 3 - 4, 2020
Seoul, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 172
    Total Downloads
  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media