skip to main content
10.1145/3613424.3623769acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

NeuroLPM - Scaling Longest Prefix Match Hardware with Neural Networks

Published: 08 December 2023 Publication History

Abstract

Longest Prefix Match engines (LPM) are broadly used in computer systems and especially in modern network devices such as Network Interface Cards (NICs), switches and routers. However, existing LPM hardware fails to scale to millions of rules required by modern systems, is often optimized for specific applications, and thus is performance-sensitive to the structure of LPM rules.
We describe NeuroLPM, a new architecture for multi-purpose LPM hardware that replaces queries in traditional memory-intensive trie- and hash-table data structures with inference in a lightweight Neural Network-based model, called RQRMI. NeuroLPM scales to millions of rules under small on-die SRAM budget and achieves stable, rule-structure-agnostic performance, allowing its use in a variety of applications. We solve several unique challenges when implementing RQRMI inference in hardware, including minimizing the amount of floating point computations while maintaining query correctness, and scaling the rule-set size while ensuring small, deterministic off-chip memory bandwidth.
We prototype NeuroLPM in Verilog and evaluate it on real-world packet forwarding rule-sets and network traces. NeuroLPM offers substantial scalability benefits without any application-specific optimizations. For example, it is the only algorithm that can serve a 950K-large rule-set at an average of 196M queries per second with 4.5MB of SRAM, only within 2% of the best-case throughput of the state-of-the-art Tree Bitmap and SAIL on smaller rule-sets. With 2MB of SRAM, it reduces the DRAM bandwidth per query, the dominant performance factor, by up to 9 × and 3 × compared to the state-of-the-art.

References

[1]
2022. RIPE NCC RIPE NETWORK AND COORDINATION CENTER. https://www.ripe.net/analyse/internet-measurements/routing-information-service-ris/ris-raw-data.
[2]
A Linux Foundation Collaborative Project. 2021. Open vSwitch. https://www.openvswitch.org/.
[3]
Alfred V. Aho and Margaret J. Corasick. 1975. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18, 6 (1975), 333–340.
[4]
Mohammad J. Akhbarizadeh, Mehrdad Nourani, Deepak S. Vijayasarathi, and T. Balsara. 2006. A nonredundant ternary CAM circuit for network search engines. IEEE Trans. Very Large Scale Integr. Syst. 14, 3 (2006), 268–278.
[5]
Albert Gran Alcoz, Martin Strohmeier, Vincent Lenders, and Laurent Vanbever. 2022. Aggregate-based congestion control for pulse-wave DDoS defense. In ACM Special Interest Group on Data Communication (SIGCOMM).
[6]
Michiel Appelman and Maikel de Boer. 2012. Performance analysis of OpenFlow hardware. University of Amsterdam, Tech. Rep (2012), 2011–2012.
[7]
Barefoot Networks. 2019. Algorithmic longest prefix matching in programmable switch. https://patents.google.com/patent/US10511532B2/en.
[8]
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando A. Mujica, and Mark Horowitz. 2013. Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN. In ACM Special Interest Group on Data Communication (SIGCOMM).
[9]
Anat Bremler-Barr, David Hay, and Yaron Koral. 2014. CompactDFA: Scalable Pattern Matching Using Longest Prefix Match Solutions. IEEE/ACM Trans. Netw. (TON) 22, 2 (2014), 415–428.
[10]
CAIDA. 2021. Center for Applied Internet Data Analysis based at the University of California’s San Diego Supercomputer Center. https://www.caida.org/.
[11]
Dibei Chen, Zhaoshi Li, Tianzhu Xiong, Zhiwei Liu, Jun Yang, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2020. CATCAM: Constant-time Alteration Ternary CAM with Scalable In-Memory Architecture. In IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12]
Sharad Chole, Andy Fingerhut, Sha Ma, Anirudh Sivaraman, Shay Vargaftik, Alon Berger, Gal Mendelson, Mohammad Alizadeh, Shang-Tse Chuang, Isaac Keslassy, Ariel Orda, and Tom Edsall. 2017. dRMT: Disaggregated Programmable Switching. In ACM Special Interest Group on Data Communication (SIGCOMM).
[13]
ClamAV. 2022. ClamAV - Open Source Antivirus Engine for detecting trojans, viruses, malware and other malicious threats. https://www.clamav.net/.
[14]
Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[15]
Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday Naik, and Amin Vahdat. 2018. Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization. In USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[16]
Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, Jesse Alpert, Jing Ai, Jon Olson, Kevin DeCabooter, Marc de Kruijf, Nan Hua, Nathan Lewis, Nikhil Kasinadhuni, Riccardo Crepaldi, Srinivas Krishnan, Subbaiah Venkata, Yossi Richter, Uday Naik, and Amin Vahdat. 2018. Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization. In USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[17]
Mikael Degermark, Andrej Brodnik, Svante Carlsson, and Stephen Pink. 1997. Small Forwarding Tables for Fast Routing Lookups. In ACM Special Interest Group on Data Communication (SIGCOMM).
[18]
Sarang Dharmapurikar, Praveen Krishnamurthy, and David E. Taylor. 2006. Longest prefix matching using bloom filters. IEEE/ACM Trans. Netw. (TON) 14, 2 (2006), 397–409.
[19]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David B. Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In ACM International Conference on Management of Data (SIGMOD).
[20]
Will Eatherton, George Varghese, and Zubin Dittia. 2004. Tree bitmap: hardware/software IP lookups with incremental updates. ACM SIGCOMM Computer Communication Review (CCR) 34, 2 (2004), 97–122.
[21]
Daniel Firestone. 2017. VFP: A Virtual Switch Platform for Host SDN in the Public Cloud. In USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[22]
Daniel Firestone, Andrew Putnam, Sambrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian M. Caulfield, Eric S. Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert G. Greenberg. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[23]
Roy Friedman, Or Goaz, and Ori Rottenstreich. 2021. Clustreams: Data Plane Clustering. In The ACM SIGCOMM Symposium on SDN Research (SOSR).
[24]
Kaustubh Gadkari, M. Lawrence Weikum, Daniel Massey, and Christos Papadopoulos. 2016. Pragmatic router FIB caching. Comput. Commun. 84 (2016), 52–62.
[25]
Igor Gashinsky. 2011. Datacenter scalability panel. North American Network Operators Group (NANOG) 52 (2011).
[26]
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference. CoRR abs/2103.13630 (2021).
[27]
Garegin Grigoryan, Yaoqing Liu, and Minseok Kwon. 2020. PFCA: A Programmable FIB Caching Architecture. IEEE/ACM Trans. Netw. (TON) 28, 4 (2020), 1872–1884.
[28]
Qing Guo, Xiaochen Guo, Yuxin Bai, and Engin Ipek. 2011. A resistive TCAM accelerator for data-intensive computing. In IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29]
Mingzhe Hao, Levent Toksoz, Nanqinqin Li, Edward Edberg Halim, Henry Hoffmann, and Haryadi S. Gunawi. 2020. LinnOS: Predictability on Unpredictable Flash Storage with a Light Neural Network. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[30]
Keqiang He, Junaid Khalid, Aaron Gember-Jacobson, Sourav Das, Chaithan Prakash, Aditya Akella, Li Erran Li, and Marina Thottan. 2015. Measuring control plane latency in SDN-enabled switches. In The ACM SIGCOMM Symposium on SDN Research (SOSR).
[31]
Peng He, Wenyuan Zhang, Hongtao Guan, Kavé Salamatian, and Gaogang Xie. 2018. Partial Order Theory for Fast TCAM Updates. IEEE/ACM Trans. Netw. (TON) 26, 1 (2018), 217–230.
[32]
Thomas Holterbach, Stefano Vissicchio, Alberto Dainotti, and Laurent Vanbever. 2017. SWIFT: Predictive Fast Reroute. In ACM Special Interest Group on Data Communication (SIGCOMM).
[33]
Intel. 2021. Intel IPU’s. https://www.intel.com/content/www/us/en/products/network-io/smartnic.html.
[34]
Md Iftakharul Islam and Javed I. Khan. 2019. SAIL Based FIB Lookup in a Programmable Pipeline Based Linux Router. In IEEE International Conference on High Performance Switching and Routing (HPSR).
[35]
Nanxi Kang, Monia Ghobadi, John Reumann, Alexander Shraer, and Jennifer Rexford. 2015. Efficient traffic splitting on commodity switches. In ACM International Conference on emerging Networking EXperiments and Technologies (CoNEXT).
[36]
Georgios P. Katsikas, Tom Barbette, Marco Chiesa, Dejan Kostic, and Gerald Q. Maguire Jr.2021. What You Need to Know About (Smart) Network Interface Cards. In Passive and Active Measurement (PAM).
[37]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: a single-pass learned index. In ACM International Conference on Management of Data (SIGMOD).
[38]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A learned database system. In Biennial Conference on Innovative Data Systems Research (CIDR).
[39]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In ACM International Conference on Management of Data (SIGMOD).
[40]
Minseok Kwon, Krishna Prasad Neupane, John Marshall, and M. Mustafa Rafique. 2020. CuVPP: Filter-based Longest Prefix Matching in Software Data Planes. In IEEE International Conference on Cluster Computing (CLUSTER).
[41]
Butler W. Lampson, Venkatachary Srinivasan, and George Varghese. 1999. IP lookups using multiway and multicolumn search. IEEE/ACM Trans. Netw. (TON) 7, 3 (1999), 324–334.
[42]
Pengfei Li, Yu Hua, Pengfei Zuo, and Jingnan Jia. 2019. A Scalable Learned Index Scheme in Storage Systems. CoRR abs/1905.06256 (2019).
[43]
Alex X. Liu, Chad R. Meiners, and Eric Torng. 2016. Packet Classification Using Binary Content Addressable Memory. IEEE/ACM Trans. Netw. (TON) 24, 3 (2016), 1295–1307.
[44]
Rick McGeer and Praveen Yalagandula. 2009. Minimizing Rulesets for TCAM Implementation. In IEEE International Conference on Computer Communications (INFOCOM).
[45]
Nick McKeown, Thomas E. Anderson, Hari Balakrishnan, Guru M. Parulkar, Larry L. Peterson, Jennifer Rexford, Scott Shenker, and Jonathan S. Turner. 2008. OpenFlow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review (CCR) 38, 2 (2008), 69–74.
[46]
Chad R. Meiners, Alex X. Liu, Eric Torng, and Jignesh Patel. 2011. Split: Optimizing Space, Power, and Throughput for TCAM-Based Classification. In ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS).
[47]
Chad R. Meiners, Jignesh Patel, Eric Norige, Eric Torng, and Alex X. Liu. 2010. Fast Regular Expression Matching Using Small TCAMs for Network Intrusion Detection and Prevention Systems. In USENIX Security Symposium.
[48]
Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat. 2009. PortLand: a scalable fault-tolerant layer 2 data center network fabric. In ACM Special Interest Group on Data Communication (SIGCOMM).
[49]
Thomas Narten, Manish Karir, and Ian Foo. 2013. Address Resolution Problems in Large Data Center Networks. RFC 6820 (2013), 1–17.
[50]
NVIDIA. 2020. NVIDIA Mellanox BlueField SmartNIC for Ethernet. https://network.nvidia.com/files/doc-2020/pb-bluefield-smart-nic.pdf.
[51]
NVIDIA. 2021. NVIDIA BLUEFIELD-2 DPU. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf.
[52]
University of Oregon. 2022. University of Oregon Route Views Project. https://www.routeviews.org/routeviews/.
[53]
Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan J. Jackson, Andy Zhou, Jarno Rajahalme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, Keith Amidon, and Martín Casado. 2015. The Design and Implementation of Open vSwitch. In USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[54]
Danny Pinto. 2021. What will happen when the routing table hits 1024k?https://blog.apnic.net/2021/03/03/what-will-happen-when-the-routing-table-hits-1024k/.
[55]
Boris Pismenny, Liran Liss, Adam Morrison, and Dan Tsafrir. 2022. The benefits of general-purpose on-NIC memory. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[56]
Alon Rashelbach, Ori Rottenstreich, and Mark Silberstein. 2020. A Computational Approach to Packet Classification. In ACM Special Interest Group on Data Communication (SIGCOMM).
[57]
Alon Rashelbach, Ori Rottenstreich, and Mark Silberstein. 2022. A Computational Approach to Packet Classification. IEEE/ACM Trans. Netw. (TON) 30, 3 (2022), 1073–1087.
[58]
Alon Rashelbach, Ori Rottenstreich, and Mark Silberstein. 2022. Scaling Open vSwitch with a Computational Cache. In USENIX Symposium on Networked Systems Design and Implementation (NSDI).
[59]
Pedro Reviriego, Gil Levy, Matty Kadosh, and Salvatore Pontarelli. 2022. Algorithmic TCAMs: Implementing Packet Classification Algorithms in Hardware. IEEE Commun. Mag. 60, 9 (2022), 60–66.
[60]
Ori Rottenstreich, Marat Radan, Yuval Cassuto, Isaac Keslassy, Carmi Arad, Tal Mizrahi, Yoram Revah, and Avinatan Hassidim. 2014. Compressing Forwarding Tables for Datacenter Scalability. IEEE Journal on Selected Areas in Communications (JSAC) 32, 1 (2014), 138–151.
[61]
Yaniv Sadeh, Ori Rottenstreich, and Haim Kaplan. 2022. Minimal Total Deviation in TCAM Load Balancing. In IEEE International Conference on Computer Communications (INFOCOM).
[62]
Giuseppe Siracusano, Salvator Galea, Davide Sanvito, Mohammad Malekzadeh, Hamed Haddadi, Gianni Antichi, and Roberto Bifulco. 2020. Running Neural Networks on the NIC. arXiv:2009.02353 (2020).
[63]
Snort. 2022. Snort Open Source Intrusion Prevention System. https://www.snort.org/.
[64]
Venkatachary Srinivasan, Subhash Suri, and George Varghese. 1999. Packet Classification Using Tuple Space Search. In ACM Special Interest Group on Data Communication (SIGCOMM).
[65]
Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, and Kunle Olukotun. 2022. Taurus: a data plane architecture for per-packet ML. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[66]
Tushar Swamy, Annus Zulfiqar, Luigi Nardi, Muhammad Shahbaz, and Kunle Olukotun. 2023. Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[67]
Synopsis. 2022. DesignWare Ternary Content-Addressable Memory Compilers. https://www.synopsys.com/dw/ipdir.php?ds=dwc_tcam_memory_compilers.
[68]
Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: a scalable learned index for multicore data storage. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
[69]
Marcel Waldvogel. 2000. Fast longest prefix matching: algorithms, analysis, and applications. Ph.D. dissertation, ETH Zürich (2000).
[70]
Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training Deep Neural Networks with 8-bit Floating Point Numbers. In Annual Conference on Neural Information Processing Systems (NeurIPS).
[71]
Priyank Ramesh Warkhede, Subhash Suri, and George Varghese. 2004. Multiway range trees: scalable IP lookup with fast updates. Comput. Networks 44, 3 (2004), 289–303.
[72]
Xingda Wei, Rong Chen, and Haibo Chen. 2020. Fast RDMA-based Ordered Key-Value Store using Remote Learned Cache. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[73]
Weisstein, Eric W.2022. Dyck Path. From MathWorld–A Wolfram Web Resource.https://mathworld.wolfram.com/DyckPath.html.
[74]
Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius Micikevicius. 2020. Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation. CoRR abs/2004.09602 (2020).
[75]
Xilinix. [n. d.]. Xilinix Ultrascale+ datasheet. https://www.xilinx.com/products/silicon-devices/fpga/kintex-ultrascale.html.
[76]
Zhaoqi Xiong and Noa Zilberman. 2019. Do Switches Dream of Machine Learning?: Toward In-Network Classification. In ACM Workshop on Hot Topics in Networks (HotNets).
[77]
Minxian Xu, Wenhong Tian, and Rajkumar Buyya. 2017. A survey on load balancing algorithms for virtual machines placement in cloud computing. Concurr. Comput. Pract. Exp. 29, 12 (2017).
[78]
Tong Yang, Gaogang Xie, Alex X. Liu, Qiaobin Fu, Yanbiao Li, Xiaoming Li, and Laurent Mathy. 2018. Constant IP Lookup With FIB Explosion. IEEE/ACM Trans. Netw. (TON) 26, 4 (2018), 1821–1836.
[79]
Marko Zec and Miljenko Mikuc. 2017. Pushing the envelope: Beyond two billion IP routing lookups per second on commodity CPUs. In IEEE International Conference on Software, Telecommunications and Computer Networks (SoftCOM).
[80]
Hongyi Zeng, Peyman Kazemian, George Varghese, and Nick McKeown. 2014. Automatic Test Packet Generation. IEEE/ACM Trans. Netw. (TON) 22, 2 (2014), 554–566.
[81]
Kai Zheng, Chengchen Hu, Hongbin Lu, and Bin Liu. 2006. A TCAM-based distributed parallel IP lookup scheme and performance analysis. IEEE/ACM Trans. Netw. (TON) 14, 4 (2006), 863–875.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
October 2023
1528 pages
ISBN:9798400703294
DOI:10.1145/3613424
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MICRO '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 161
    Total Downloads
  • Downloads (Last 12 months)87
  • Downloads (Last 6 weeks)13
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media