Abstract
Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10 Gbps speed or higher is a challenging problem and it is still one of the performance bottlenecks in core routers. In general, classification algorithms face the same challenge of balancing between high classification speed and low memory requirements. This paper proposes a modified recursive flow classification (RFC) algorithm, Bitmap-RFC, which significantly reduces the memory requirements of RFC by applying a bitmap compression technique. To speed up classifying speed, we exploit the multithreaded architectural features in various algorithm development stages from algorithm design to algorithm implementation. As a result, Bitmap-RFC strikes a good balance between speed and space. It can significantly keep both high classification speed and reduce memory space consumption. This paper investigates the main NPU software design aspects that have dramatic performance impacts on any NPU-based implementations: memory space reduction, instruction selection, data allocation, task partitioning, and latency hiding. We experiment with an architecture-aware design principle to guarantee the high performance of the classification algorithm on an NPU implementation. The experimental results show that the Bitmap-RFC algorithm achieves 10 Gbps speed or higher and has a good scalability on Intel IXP2800 NPU.
- Agere. Network Processors. http://www.agere.com/telecom/network_processors.html.Google Scholar
- Allen, J. R., Bass, B., et al. 2003. IBM PowerNP Network Processor: Hardware, Software, and Applications. IBM J. Res. & Dev., 47, 2/3 (Mar./May). Google ScholarDigital Library
- Amcc. Network Processors. https://www.amcc.com/MyAMCC/jsp/public/browse/controller.jsp? networkLevel=COMM&superFamily=NETP.Google Scholar
- Avici. Avici Intros Multiservice Line Cards. http://www.lightreading.com/document.asp?doc_id= 34665&site=supercomm.Google Scholar
- Baboescu, F. and Varghese, G. 2001. Scalable packet classification. In Proceedings of ACM SIGCOMM'01. San Diego, California. 199--210. Google ScholarDigital Library
- Baboescu, F., Singh, S., and Varghese, G. 2003. Packet classification for core routers: Is there an alternative to CAMs? In INFOCOM 2003, Twenty-Second Annual Joint Conference of the IEEE Computer and Communications Societies, Vol. 1, 53--63.Google Scholar
- Cisco Systems. Cisco CRS-1 Carrier Routing System. http://www.cisco.com/en/US/products/ps5763/.Google Scholar
- Degermark, M., Brodnik, A., Carlsson, S., and Pink, S. 1997. Small forwarding tables for fast routing lookups. In Proceedings of ACM SIGCOMM'97, Cannes, France. 3--14. Google ScholarDigital Library
- Eatherton, W., Varghese, G., and Dittia, Z. 2004. Tree bitmap: Hardware/software IP lookups with incremental updates. In ACM SIGCOMM Computer Communication Review 34, 2, (Apr.), 97--122. Google ScholarDigital Library
- Freescale. C-Port Network Processors. http://www.freescale.com/webapp/sps/site/homepage.jsp? nodeId=02VS0lDFTQ3126.Google Scholar
- Gupta, P. and Mckeown, N. 1999. Packet classification on multiple fields. In Proceedings of ACM SIGCOMM'99. Cambridge, MA. 147--160. Google ScholarDigital Library
- Gupta, P. and Mckeown, N. 2000. Classifying packets with hierarchical intelligent cuttings. In IEEE Micro 20, 1, (Jan./Feb.), 34--41. Google ScholarDigital Library
- Hu, X. H., Tang, X. N., and Hua, B. 2006. High-performance IPv6 forwarding algorithm for a multi-core and multithreaded network processor. InProceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'06). Mar. 168--177. Google ScholarDigital Library
- Huawei. Huawei Launches NetEngine80 Core Router At Networld Interop 2001 Exhibition in US. http://www.huawei.com/news/view.do?id=88&cid=-1001.Google Scholar
- Intel. IXP2XXX Network Processors. http://www.intel.com/design/network/products/npfamily/ixp2xxx.htm.Google Scholar
- Kounavis, M., et al. 2003. Directions in packet classification for network processors. In Proceedings of Second Workshop on Network Processors (NP2).Google Scholar
- Kulkarni, C., Gries, M., Sauer, C., and Keutzer, K. 2003. Programming challenges in network processor deployment. In Proceedings of the 2003 International Conference on Compilers, Architecture, and Synthesis for Embedded System. San Jose, CA. 178--187. Google ScholarDigital Library
- Lakshman, T. V. and Stiliadis, D. 1998. High-speed policy-based packet forwarding using efficient multi-dimensional range matching. In Proceedings of ACM SIGCOMM'98. Vancouver, British Columbia, Canada. 203--214. Google ScholarDigital Library
- Qi, Y. X. and Li, J. 2006. Towards effective packet classification. In Proceedings of IASTED Conference on Communication, Network, and Information Security (CNIS).Google Scholar
- Sherwood, T., Varghese, G., and Calder, B. 2003. A pipelined memory architecture for high throughput network processors. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ACM ISCA'03). San Diego, CA. 288--299. Google ScholarDigital Library
- Singh, S., Baboescu, F., Varghese, G., and Wang, J. 2003. Packet classification using multidimensional cutting. InProceedings of ACM SIGCOMM'03. Karlsruhe, Germany. 213--224. Google ScholarDigital Library
- Spitznagel, E. 2003. Compressed Data Structures for Recursive Flow Classification. http://www.cse.seas.wustl.edu/Research/FileDownload.asp?295.Google Scholar
- Srinivasan, V., Suri, S., Varghese, G., and Waldvogel, M. 1998. Fast and scalable layer four switching. In Proceedings of ACM SIGCOMM'98. Vancouver, British Columbia, Canada. 191--202. Google ScholarDigital Library
- Tang, X. N. and Gao, G. R. 1998. How “hard” is thread partitioning and how “bad” is a list scheduling based partitioning algorithm? In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures (SPAA'98). Puerto Vallarta, Mexico. 130--139. Google ScholarDigital Library
- Tang, X. N. and Gao, G. R. 1999. Automatically partitioning threads for multithreaded architectures. In Journal of Parallel Distributed Computing 58, 2 (Aug.), 159--189. Google ScholarDigital Library
- Tang, X. N., Wang, J., Theobald, K., and Gao, G. R. 1997. Thread partitioning and scheduling based on cost model. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'97). Newport, RI. 272--281. Google ScholarDigital Library
- Taylor, D. E. and Turner, J. S. 2005. ClassBench: A packet classification benchmark. In Proceedings of IEEE INFOCOMM'05. Miami, FL. 2068--2079.Google Scholar
Index Terms
- High-performance packet classification algorithm for multithreaded IXP network processor
Recommendations
High-performance packet classification algorithm for many-core and multithreaded network processor
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systemsPacket classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10Gbps ...
Scalable packet classification using interpreting: a cross-platform multi-core solution
PPoPP '08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programmingPacket classification is an enabling technology to support advanced Internet services. It is still a challenge for a software solution to achieve 10Gbps (line-rate) classification speed. This paper presents a classification algorithm that can be ...
High-performance IPv6 forwarding algorithm for multi-core and multithreaded network processor
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programmingIP forwarding is one of the main bottlenecks in Internet backbone routers, as it requires performing the longest-prefix match at 10Gbps speed or higher. IPv6 forwarding further exacerbates the situation because its search space is quadrupled. We propose ...
Comments