A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems

You, Guohua; Wang, Xuejing

doi:10.1007/s10586-019-03030-z

A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems

Published: 01 January 2020

Volume 23, pages 2591–2608, (2020)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Guohua You¹ &
Xuejing Wang¹

229 Accesses
4 Citations
Explore all metrics

Abstract

Processing-intensive web server requests can lead to low Quality of Service (QoS), such as longer mean response time and lower throughput, which calls for a new web server software framework that can improve the performance of web servers. The characteristic of request-level parallelism in web servers is fit for many-core architecture accelerators, such as GPU and Intel Xeon Phi co-processors, but traditional web server model cannot make full use of the performance of these accelerators. We proposed a new web server software framework— called MIC-based Server-side Accelerator Framework (MSAF)—for a machine with not only multi-core CPUs but also Intel Xeon Phi co-processors based on Staged Event Driven Architecture (SEDA). The framework can fully exploit the performance of Intel Xeon Phi co-processors and multi-core CPUs, and improve power/energy efficiency by offloading the stage of handling requests to Intel Xeon Phi co-processors. We implemented the web server simulation software based on MSAF framework on a machine with multi-core CPUs and Intel Xeon Phi co-processors, and evaluated it by means of Apache Benchmark (AB). Our evaluation of MSAF shows its performance is about equivalent to that of a web server cluster consisting of four to five computing nodes. This paper indicates that if MSAF is applied to, Intel Xeon Phi co-processors are suitable for server side software, such as web servers, DNS servers, and database servers, because of its characteristic of lower communication latency between Intel Xeon Phi co-processors and host, more powerful logic processing ability, and more energy efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on chiplets: interface, interconnect and integration methodology

Article 31 March 2022

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

References

Al-Tarazi, M., Chang, J.M.: Network-aware energy saving multi-objective optimization in virtualized data centers. Clust. Comput. 22, 635–647 (2018)
Article Google Scholar
Schroeder, T., Goddard, S., Ramamurthy, B.: Scalable web server clustering technologies. IEEE Network 14, 38–45 (2000)
Article Google Scholar
Cardellini, V., Casalicchio, E., Colajanni, M., Yu, PhS: The state of the art in locally distributed web-server systems. ACM Comput. Surveys (CSUR) 34(2), 263–311 (2002)
Article Google Scholar
Andreolini, M., Casalicchio, E.: A cluster-based web system providing differentiated and guaranteed services. Clust. Comput. 7(1), 7–19 (2004)
Article Google Scholar
Hellerstein, J.L., Katircioglu, K., Surendra, M.: An on-line, business-oriented optimization of performance and availability for utility computing. IEEE J. Sel. Areas Commun. 23(10), 2013–2021 (2005)
Article Google Scholar
Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V., Fawn: a fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP’09, pp. 1–14 (2009)
Chase, J.S., Anderson, D.C., Thakar, P.N., Vahdat, A.M., Doyle, R.P.: Managing energy and server resources in hosting centers. In: Proceedings of the eighteenth ACM symposium on Operating systems principles, vol. 35, pp. 103–116 (2001)
NVIDIA, Tesla K20 GPU accelerator board specification. http://www.nvidia.com/content/PDF/kepler/Tesla-K20-Passive-BD-06455-001-v05.pdf
Intel Corp, Intel Xeon Phi coprocessor. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html
Zhang C., Li P., Sun G, et al.: Optimizing fpga based accelerator design for deep convolutional neural networks[C] Acm/sigda International Symposium on Field-programmable Gate Arrays. ACM (2015)
Agrawal, S.R., Pistol, V., Pang, J., et al: Rhythm: harnessing data parallel hardware for server workloads, in: International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, pp. 19–34 (2014)
Fjalling, T., Stenstrom, P., Performance impact of batching web-application requests using hot-spot processing on GPUs. In: Parallel and distributed processing symposium IEEE, pp. 989–999 (2015)
Putnam, A., Gray, J., Haselman, M., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59(11), 114–122 (2016)
Article Google Scholar
Lim, R., Lee, Y., Kim, R., et al.: An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512. Clust. Comput. 21, 1785–1795 (2018)
Article Google Scholar
Kao, C.C., Hsu, W.C.: Exploring hidden coherency of ray-tracing for heterogeneous systems using online feedback methodology. Vis. Comput. 34, 633–643 (2017)
Article Google Scholar
Sharifian, S., Motamedi, S.A., Akbari, M.K.: A content-based load balancing algorithm with admission control for cluster web servers. Future Gener. Comput. Syst. 24(8), 775–787 (2008)
Article Google Scholar
Reisizadeh, A., Prakash, S., Pedarsani R., et al: Coded computation over heterogeneous clusters. In: Information theory (ISIT) 2017 IEEE international symposium on, ISIT 2017, pp. 2408–2412 (2017)
Potluri, S., Bureddy, D., Hamidouche K., et al: MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters, in: High PERFORMANCE computing, networking, storage and analysis, IEEE, pp.1–11 (2013)
Lu, M., Zhang, L., Huynh H.P., et al: Optimizing the mapreduce framework on intel xeon phi coprocessor. In: Big Data, 2013 IEEE international conference on, IEEE, pp. 125–130 (2013)
Jha, S., He, B., Lu, M., et al.: Improving main memory hash joins on intel xeon phi processors: an experimental approach. Proc. VLDB Endowment 8(6), 642–653 (2015)
Article Google Scholar
Lima, J.V.F., Broquedis, F., Gautier, T. et al: Preliminary experiments with xkaapi on intel xeon phi coprocessor. In: Computer architecture and high performance computing (SBAC-PAD) 25th international symposium on. IEEE, pp. 105–112 (2013)
Hou, K., Wang, H., Feng, W.: Aspas: a framework for automatic simdization of parallel sorting on x86-based many-core processors, In: Proceedings of the 29th ACM on international conference on supercomputing, ACM, pp. 383–392 (2015)
Von Behren, J.R., Condit, J., Brewer, E.A.: Why events are a bad idea (for high-concurrency servers, In: HotOS, pp. 19–24 (2003)
Pariag, D., Brecht, T., Harji, A., et al.: Comparing the performance of web server architectures. ACM SIGOPS Operat. Syst. Rev. 41(3), 231–243 (2007)
Article Google Scholar
Crovella, M.E., Frangioso, R., Harchol-Balter, M., Connection scheduling in web servers, Boston University Computer Science Department (1999)
Liu, W.L., Lung, C.H., Ajila, S.: Impact of aspect-oriented programming on software performance: a case study of leader/followers and half-sync/half-async architectures. In: Computer software and applications conference 2011 IEEE 35th annual, COMPSAC 2011, pp. 662–667 (2011)
Reese, W.: Nginx: the high-performance web server and reverse proxy. Linux Journal 173, 2 (2008)
Google Scholar
Hu, Y., Nanda, A., Yang, Q.: Measurement, analysis and performance improvement of the Apache web server, In: Performance, computing and communications conference IEEE international, pp. 261–267 (1999)
Vukotic, A., Goodwill, J.: Apache Tomcat 7. Apress, New York (2011)
Book Google Scholar
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable Internet services. In: Proceedings of the eighteenth symposium on operating systems principles, Banff, ACM, pp. 230–243 (2001)
Choi, G.S., Das, C.R.: A superscalar software architecture model for multi-core processors. J. Syst. Software 83, 1823–1837 (2010)
Article Google Scholar
Guo, D., Bhuyan, L.N., Liu, B.: An efficient parallelized L7-filter design for multicore servers. IEEE/ACM Transact. Netw. 20(5), 1426–1439 (2012)
Article Google Scholar
Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, N.: An analysis of linux scalability to many cores. In: Proceedings of the 9th USENIX conference on operating systems design and implementation, OSDI’10, Berkeley, USA, pp. 1–8 (2010)
Harji, A.S., Buhr, P.A., Brecht, T.: Comparing high-performance multi-core web-server architectures. In: Proceedings of the 5th annual international systems and storage conference, SYSTOR’12, New York, USA, pp. 1-12 (2012)
Hashemian, R., Krishnamurthy, D., Arlitt, M., Carlsson, N.: Characterizing the scalability of a web application on a multi-core server. Concurr. Comput. 26, 2027–2052 (2014)
Article Google Scholar
You, G., Zhao, Y.: A weighted-fair-queuing (WFQ)-based dynamic request scheduling approach in a multi-core system. Future Gener. Comput. Syst. 28, 1110–1120 (2012)
Article Google Scholar
You, G., Wang, X., Zhao, Y.: An adaptive dynamic request scheduling model for multi-socket, Multi-core Web Servers. Arab. J. Sci. Eng. 42, 751–764 (2016)
Article Google Scholar
Sharifian, S., Motamedi, S.A., Akbari, M.K.: A predictive and probabilistic load-balancing algorithm for cluster-based web servers. Appl. Soft. Comput. 11, 970–981 (2011)
Article Google Scholar
Uiseok, S., Bodon, J., Sungyong, P., et al.: Optimizing communication performance in scale-out storage system. Clust. Comput. 22, 335–346 (2018)
Google Scholar
Gammo, L., Brecht, T., Shukla A., et al: Comparing and evaluating epoll, select, and poll event mechanisms. In: Proceedings of annual linux symposium (2004)
Borhani, A.H., Hung, T., Lee, B.S., et al.: Power-network aware VM migration heuristics for multi-tier web applications. Clust. Comput. (2018). https://doi.org/10.1007/s10586-018-2872-x
Article Google Scholar
Hernández-Orallo, E., Vila-Carbó, J.: Web server performance analysis using histogram workload models. Comput. Netw. 53, 2727–2739 (2009)
Article Google Scholar

Download references

Acknowledgements

This paper has been supported by the Fundamental Research Funds for the Central Universities (Grant No. PT1607), and CHEMCLOUDCOMPUTING@BUCT.

Author information

Authors and Affiliations

College of Information Science & Technology, Center for Information Technology, Beijing University of Chemical Technology, Beijing, 100029, People’s Republic of China
Guohua You & Xuejing Wang

Authors

Guohua You
View author publications
You can also search for this author in PubMed Google Scholar
Xuejing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guohua You.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, G., Wang, X. A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems. Cluster Comput 23, 2591–2608 (2020). https://doi.org/10.1007/s10586-019-03030-z

Download citation

Received: 21 May 2018
Revised: 28 March 2019
Accepted: 19 December 2019
Published: 01 January 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10586-019-03030-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Containerization technologies: taxonomies, applications and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Containerization technologies: taxonomies, applications and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation