ABSTRACT
Network virtualization requires careful control of networking resources, including link bandwidth, router memory, and packet processing time. Isolation and fair sharing of processing resources in current high-performance packet processors occur at the granularity of entire processor cores. Scaling of network virtualization to larger numbers of parallel slices requires a more fine-grained processor sharing mechanism. Our work presents a novel approach, called Fair Multithreading (FMT), that allows hardware threads to share a processor core while ensuring isolation and weighted fair access. We present an analysis of the FMT algorithm and a prototype implementation on a NetFPGA system. Our evaluation results indicate that FMT can be implemented at speeds that are necessary to make scheduling decisions at the instruction level. We show the impact of having such fine-grained processor schedulers in substrate nodes by comparing the resource utilization of virtual network slices in our system to traditional whole-core allocations. Our simulation results show the FMT-based substrate networks can be utilized more efficiently and more virtual network requests can be accommodated. These results indicate the significant improvement in system scalability that can be gained from our fine-grained processor scheduling system.
- Agarwal, A., Lim, B.-H., Kranz, D., and Kubiatowicz, J. APRIL: a processor architecture for multiprocessing. In Proc. of the 17th Annual International Symposium on Computer Architecture (ISCA) (Seattle, WA, May 1990), pp. 104--114. Google ScholarDigital Library
- Anderson, T., Peterson, L., Shenker, S., and Turner, J. Overcoming the Internet impasse through virtualization. Computer 38, 4 (Apr. 2005), 34--41. Google ScholarDigital Library
- Anwer, M. B., and Feamster, N. Building a fast, virtualized data plane with programmable hardware. In Proc. of the First ACM SIGCOMM Workshop on Virtualized Infrastructure Systems and Architectures (VISA) (Barcelona, Spain, Aug. 2009), pp. 1--8. Google ScholarDigital Library
- Bavier, A., Feamster, N., Huang, M., Peterson, L., and Rexford, J. In VINI veritas: realistic and controlled network experimentation. In SIGCOMM '06: Proceedings of the 2006 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (Pisa, Italy, Aug. 2006), pp. 3--14. Google ScholarDigital Library
- Cherkasova, L., Gupta, D., and Vahdat, A. Comparison of the three CPU schedulers in Xen. SIGMETRICS Performance Evaluation Review 35, 2 (Sept. 2007), 42--51. Google ScholarDigital Library
- Cordella, L. P., Foggia, P., Sansone, C., and Vento, M. An improved algorithm for matching large graphs. In Proc. of 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition (Ischia, Italy, May 2001), pp. 149--159.Google Scholar
- Eatherton, W. The push of network processing to the top of the pyramid. In Keynote Presentation at ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS) (Princeton, NJ, Oct. 2005).Google Scholar
- Feldmann, A. Internet clean-slate design: what and why? SIGCOMM Computer Communication Review 37, 3 (July 2007), 59--64. Google ScholarDigital Library
- Goyal, P., Guo, X., and Vin, H. M. A hierarchical CPU scheduler for multimedia operating systems. ACM SIGOPS Operating Systems Review 30, SI (Oct. 1996), 107--121. Google ScholarDigital Library
- Grohoski, G. Niagara2: A highly threaded server-on-a-chip. In Proc. of Symposium on High Performance Chips (HOT CHIPS 18 (Palo Alto, CA, Aug. 2006).Google Scholar
- Intel Corporation. Intel Second Generation Network Processor, 2005. http://www.intel.com/design/network/products/npfamily/.Google Scholar
- Liao, G., Guo, D., Bhuyan, L., and King, S. R. Software techniques to improve virtualized i/o performance on multi-core systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS) (San Jose, CA, Nov. 2008), pp. 161--170. Google ScholarDigital Library
- Liao, Y., Yin, D., and Gao, L. PdP: parallelizing data plane in virtual network substrate. In Proc. of the First ACM SIGCOMM Workshop on Virtualized Infrastructure Systems and Architectures (VISA) (Barcelona, Spain, Aug. 2009), pp. 9--18. Google ScholarDigital Library
- Lischka, J., and Karl, H. A virtual network mapping algorithm based on subgraph isomorphism detection. In Proc. of the First ACM SIGCOMM Workshop on Virtualized Infrastructure Systems and Architectures (VISA) (Barcelona, Spain, Aug. 2009), pp. 81--88. Google ScholarDigital Library
- Lockwood, J. W., McKeown, N., Watson, G., Gibb, G., Hartke, P., Naous, J., Raghuraman, R., and Luo, J. NetFPGA--an open platform for gigabit-rate network switching and routing. In MSE '07: Proceedings of the 2007 IEEE International Conference on Microelectronic Systems Education (San Diego, CA, June 2007), pp. 160--161. Google ScholarDigital Library
- Ramaswamy, R., Weng, N., and Wolf, T. Analysis of network processing workloads. Journal of Systems Architecture 55, 10 (Oct. 2009), 421--433. Google ScholarDigital Library
- Rhoads, S. Plasma -- most MIPS I(TM) Opcodes, 2001. http://www.opencores.org/project, plasma.Google Scholar
- Tullsen, D. M., Eggers, S. J., and Levy, H. M. Simultaneous multithreading: Maximizing on-chip parallelism. In Proc. of 20th International Symposium on Computer Architecture (Santa Margherita Ligure, Italy, June 1995), pp. 278--288. Google ScholarDigital Library
- Turner, J. S., Crowley, P., DeHart, J., Freestone, A., Heller, B., Kuhns, F., Kumar, S., Lockwood, J., Lu, J., Wilson, M., Wiseman, C., and Zar, D. Supercharging PlanetLab: a high performance, multi-application, overlay network platform. In SIGCOMM '07: Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications (Kyoto, Japan, Aug. 2007), pp. 85--96. Google ScholarDigital Library
- Turner, J. S., and Taylor, D. E. Diversifying the Internet. In Proc. of IEEE Global Communications Conference (GLOBECOM) (Saint Louis, MO, Nov. 2005), vol. 2.Google ScholarCross Ref
- Wiseman, C., Turner, J., Becchi, M., Crowley, P., DeHart, J., Haitjema, M., James, S., Kuhns, F., Lu, J., Parwatikar, J., Patney, R., Wilson, M., Wong, K., and Zar, D. A remotely accessible network processor-based router for network experimentation. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS) (San Jose, CA, Nov. 2008), pp. 20--29. Google ScholarDigital Library
- Wolf, T. Service-centric end-to-end abstractions in next-generation networks. In Proc. of Fifteenth IEEE International Conference on Computer Communications and Networks (ICCCN) (Arlington, VA, Oct. 2006), pp. 79--86.Google ScholarCross Ref
- Wu, Q., Chasaki, D., and Wolf, T. Simplifying data path processing in next-generation routers. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication Systems (ANCS) (Princeton, NJ, Oct. 2009). Google ScholarDigital Library
- Zegura, E., Calvert, K., and Bhattacharjee, S. How to model an internetwork. In Proc. of the Fifteenth IEEE Conference on Computer Communications (INFOCOM) (San Francisco, CA, Mar. 1996), pp. 594--602. Google ScholarDigital Library
Index Terms
Fair multithreading on packet processors for scalable network virtualization
Recommendations
A survey of processors with explicit multithreading
Hardware multithreading is becoming a generally applied technique in the next generation of microprocessors. Several multithreaded processors are announced by industry or already into production in the areas of high-performance microprocessors, media, ...
Hybrid multithreading for VLIW processors
CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systemsSeveral multithreading techniques have been proposed to reduce resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique that improves processor performance by issuing multiple ...
Memory-level parallelism aware fetch policies for simultaneous multithreading processors
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated ...
Comments