Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures

Ortiz, Andrés; Ortega, Julio; Díaz, Antonio F.; Prieto, Alberto

doi:10.1007/s11390-013-1352-2

Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures

Regular Paper
Published: 01 May 2013

Volume 28, pages 508–524, (2013)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Andrés Ortiz¹,
Julio Ortega²,
Antonio F. Díaz² &
…
Alberto Prieto²

196 Accesses
8 Citations
Explore all metrics

Abstract

Improving the network interface performance is needed by the demand of applications with high communication requirements (for example, some multimedia, real-time, and high-performance computing applications), and the availability of network links providing multiple gigabits per second bandwidths that could require many processor cycles for communication tasks. Multicore architectures, the current trend in the microprocessor development to cope with the difficulties to further increase clock frequencies and microarchitecture efficiencies, provide new opportunities to exploit the parallelism available in the nodes for designing efficient communication architectures. Nevertheless, although present OS network stacks include multiple threads that make it possible to execute network tasks concurrently in the kernel, the implementations of packet-based or connection-based parallelism are not trivial as they have to take into account issues related with the cost of synchronization in the access to shared resources and the efficient use of caches. Therefore, a common trend in many recent researches on this topic is to assign network interrupts and the corresponding protocol and network application processing to the same core, as with this affinity scheduling it would be possible to reduce the contention for shared resources and the cache misses. In this paper we propose and analyze several configurations to distribute the network interface among the different cores available in the server. These alternatives have been devised according to the affinity of the corresponding communication tasks with the location (proximity to the memories where the different data structures are stored) and characteristics of the processing core. As this approach uses several cores to accelerate the communication path of a given connection, it can be seen as complementary to those that consider several cores to simultaneously process packets belonging to either the same or different connections. Message passing interface (MPI) workloads and dynamic web servers have been considered as applications to evaluate and compare the communication performance of these alternatives. In our experiments, performed by full-system simulation, improvements of up to 35% in the throughput and up to 23% in the latency have been observed in MPI workloads, and up to 100% in the throughput, up to 500% in the response time, and up to 82% in the requests attended per second have been measured in dynamic web servers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An In-Depth Performance Analysis of Many-Integrated Core for Communication Efficient Heterogeneous Computing

Communication-Aware Affinity Scheduling Heuristics in Multicore Systems

Evaluating Contexts in OpenSHMEM-X Reference Implementation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Balaji P, Feng W, Panda DK (2006) Bridging the Ethernet-Ethernot performance gap. IEEE Micro 26(3):24–40
Article Google Scholar
Bhoedjang R, Rúhl T, Bal HE (1998) User-level network interface protocols. IEEE Computer 31(11):53–60
Article Google Scholar
Gilfeather P, Maccabe A. Modeling protocol offload for message-oriented communication. In Proc. the 2005 IEEE Int. Conf. Cluster Computing, Sept. 2005, pp.1–10.
Regnier G, Makineni S, Illikkal R et al (2004) TCP onloading for data center servers. IEEE Computer 37(11):48–58
Article Google Scholar
Shivam P, Chase J. On the elusive benefits of protocol offload. In Proc. 2003 SIGCOMM NICELI, August 2003, pp.179–184.
Westrelin R, Fugier N, Nordmark E et al. Studying network protocol offload with emulation: Approach and preliminary results. In Proc. the 12th IEEE Symp. HOTI, Aug. 2004, pp.84–90.
Nahum E, Yates D, Kurose J, Towsley, D. Performance issues in parallelized network protocols. In Proc. the 1st USENIX OSDI, November 1994, Article No.10.
Willmann P, Rixner S, Cox A. An evaluation of network stack parallelization strategies in modern operating systems. In Proc. the USENIX Technical Conf., May 2006, pp.91–96.
Mogul J C. TCP offload is a dumb idea whose time has come. In Proc. the 9th HotOS, May 2003, pp.25–30.
Apte V, Hansen T, Reeser P (2003) Performance comparison of dynamic web platforms. Computer Communications 26(8):888–898
Article Google Scholar
Lauritzen K, Sawicki T, Stachura T, Wilson C. Intelr I/O acceleration technology improves network performance, reliability and efficiency. Technology@Intel Magazine, May 2005, pp.3–11.
Foong A, Fung J, Newell D et al. Architectural characterization of processor affinity in network processing. In Proc. the IEEE ISPASS, March 2005, pp.207–218.
Jang H, Jin H W. MiAMI: Multi-core aware processor affinity for TCP/IP over multiple network interfaces. In Proc. the 17th Symp. HOTI, Aug. 2009, pp.73–82.
Wu W, DeMar P, Crawford M (2012) A transport-friendly NIC for multicore/multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems 23(4):607–615
Article Google Scholar
Kim H, Pai V, Rixner S. Exploiting task-level concurrency in a programmable network interface. In Proc. the 9th ACM SIGPLAN PPoPP, June 2003, pp.61–72.
Kumar A, Huggahalli R. Impact of cache coherence protocols on the processing of network traffic. In Proc. the 40th IEEE/ACM MICRO, Dec. 2007, pp.161–171.
Magnusson P, Christensson M, Eskilson J et al (2002) Simics: A full system simulation platform. IEEE Computer 35(2):50–58
Article Google Scholar
Willmann P, Shafer J, Carr D et al. Concurrent direct network access for virtual machine monitors. In Proc. the 13th HPCA, Feb. 2007, pp.306–317.
Benvenuti C. Understanding Linux Network Internals (1st edition). OReilly Media Inc., 2005.
Love R. Linux Kernel Development (2nd edition). Sams Publishing, 2005.
Ortiz A, Ortega J, Díaz A, Prieto A (2010) Network interfaces for programmable NICs and multicore platforms. Computer Networks 54(3):357–376
Article MATH Google Scholar
Clark D, Jacobson V, Romkey J et al (1989) An analysis of TCP processing overhead. IEEE Communications Magazine 27(6):23–29
Article Google Scholar
GadelRab S (2007) 10-Gigabit Ethernet connectivity for computer servers. IEEE Micro 27(3):94–105
Article Google Scholar
Nahum E, Yates D, Kurose J et al. Cache behaviour of network protocols. In Proc. the ACM SIGMETRICS, June 1997, pp.169–180.
Tu T, Hsueh C (2011) Unified UDispatch: A user dispatching tool for multicore systems. Journal of Computer Science and Technology 26(3):375–391
Article Google Scholar
Liao G, Zhu X, Bhuyan L. A new server I/O architecture for high speed networks. In Proc. the 17th HPCA, Feb. 2011, pp.255–265.
Ortiz A, Ortega J, Díaz A et al (2009) Protocol offload analysis by simulation. J Systems Architecture 55(1):25–42
Article Google Scholar
Ortiz A, Ortega, J, Diaz, A, Prieto, A. Protocol offload evaluation using Simics. In Proc. the 2006 IEEE International Conference on Cluster Computing, September 2006, pp.1–9.
Pacifici G, Segmuller W, Spreitzer M, Tantawi A (2008) CPU demand for web serving: Measurement analysis and dynamic estimation. Performance Evaluation 65(6/7):531–553
Article Google Scholar
Yeager N, McGrath R. Web Server Technology: The Advanced Guide for World Wide Web Information Providers. San Francisco CA: Morgan-Kaufmann, Inc., 1996.
MPICH2: A high performance and widely portable implementation of the message passing interface (MPI) standard. http://www.mpich.org/, October 2012.
Kim H, Rixner S. TCP offload through connection handoff. In Proc. the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems, October 2006, pp.279–290.
Kim H, Rixner S, Pai V (2005) Network interface data caching. IEEE Transactions on Computers 54(11):1394–1408
Article Google Scholar
Vaidyanathan K, Panda D K. Benefits of I/O acceleration technology (I/OAT) in clusters. In Proc. the 2007 IEEE ISPASS, April 2007, pp.220–229.
Shalev L, Marhervaks V, Machulsky Z et al. Loosely coupled TCP acceleration architecture. In Proc. the 14th HOTI, Aug. 2006, pp.3–8.
Narayanaswamy G, Balaji P, Feng W. An analysis of 10-Gigabit Ethernet protocol staks in multicore environments. In Proc. the 15th HOTI, August 2007, pp.109–116.
de Bruijn W, Bos H. Model-T: Rethinking the OS for terabits speeds. In Proc. the INFOCOM Workshop on High-Speed Networks, April 2008, pp.1–6.
Wun B, Crowley P. Network I/O acceleration in heterogeneous multicore processors. In Proc. the 14th HOTI, August 2006, pp.9–14.
Brecht T, Janakiraman G, Lynn B et al. Evaluating network processing efficiency with processor partitioning and asynchronous I/O. In Proc. the 1st ACM SIGOPS/EuroSys European Conf. Computer Systems, Apr. 2006, pp.265–278.
Foong A, Fung J, Newell D. An in-depth analysis of the impact of processor affinity on network performance. In Proc. the 12th IEEE Int. Conf. Networks, Mar. 2004, pp.244–250.
Goglin B (2011) NIC-assisted cache-efficient receive stack for message passing over Ethernet. Concurrency and Computation: Practice and Experience 23(2):199–210
Article Google Scholar
Jin H, Yun Y, Jang H C. TCP/IP performance near I/O bus bandwidth on multi-core systems: 10-Gigabit Ethernet vs. multi-port Gigabit Ethernet. In Proc. the International Conference on Parallel Processing, September 2008, pp.87–94.
Narayanaswamy G, Balaji P, Feng W. Impact of network sharing in multi-core architectures. In Proc. the 17th Int. Conf. Comp. Commun. Networks, Aug. 2008, pp.1–6.

Download references

Author information

Authors and Affiliations

Department of Communications Engineering, University of Málaga, Málaga, 29071, Spain
Andrés Ortiz
Department of Computer Architecture and Technology/CITIC, University of Granada, Granada, 18071, Spain
Julio Ortega (Senior Member, IEEE), Antonio F. Díaz & Alberto Prieto (Senior Member, IEEE)

Authors

Andrés Ortiz
View author publications
You can also search for this author in PubMed Google Scholar
Julio Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Antonio F. Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Prieto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrés Ortiz.

Additional information

This work was partly supported by the Ministry of Science and Innovation of Spain under Grant Nos. SAF2010-20558, TIN2012-32039 and IPT2011-1728-430000.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(DOC 41.0 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ortiz, A., Ortega, J., Díaz, A.F. et al. Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures. J. Comput. Sci. Technol. 28, 508–524 (2013). https://doi.org/10.1007/s11390-013-1352-2

Download citation

Received: 30 January 2012
Revised: 11 December 2012
Published: 01 May 2013
Issue Date: May 2013
DOI: https://doi.org/10.1007/s11390-013-1352-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An In-Depth Performance Analysis of Many-Integrated Core for Communication Efficient Heterogeneous Computing

Communication-Aware Affinity Scheduling Heuristics in Multicore Systems

Evaluating Contexts in OpenSHMEM-X Reference Implementation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(DOC 41.0 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Affinity-Based Network Interfaces for Efficient Communication on Multicore Architectures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An In-Depth Performance Analysis of Many-Integrated Core for Communication Efficient Heterogeneous Computing

Communication-Aware Affinity Scheduling Heuristics in Multicore Systems

Evaluating Contexts in OpenSHMEM-X Reference Implementation

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(DOC 41.0 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation