Skip to main content

Advertisement

Log in

A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Processing-intensive web server requests can lead to low Quality of Service (QoS), such as longer mean response time and lower throughput, which calls for a new web server software framework that can improve the performance of web servers. The characteristic of request-level parallelism in web servers is fit for many-core architecture accelerators, such as GPU and Intel Xeon Phi co-processors, but traditional web server model cannot make full use of the performance of these accelerators. We proposed a new web server software framework— called MIC-based Server-side Accelerator Framework (MSAF)—for a machine with not only multi-core CPUs but also Intel Xeon Phi co-processors based on Staged Event Driven Architecture (SEDA). The framework can fully exploit the performance of Intel Xeon Phi co-processors and multi-core CPUs, and improve power/energy efficiency by offloading the stage of handling requests to Intel Xeon Phi co-processors. We implemented the web server simulation software based on MSAF framework on a machine with multi-core CPUs and Intel Xeon Phi co-processors, and evaluated it by means of Apache Benchmark (AB). Our evaluation of MSAF shows its performance is about equivalent to that of a web server cluster consisting of four to five computing nodes. This paper indicates that if MSAF is applied to, Intel Xeon Phi co-processors are suitable for server side software, such as web servers, DNS servers, and database servers, because of its characteristic of lower communication latency between Intel Xeon Phi co-processors and host, more powerful logic processing ability, and more energy efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Al-Tarazi, M., Chang, J.M.: Network-aware energy saving multi-objective optimization in virtualized data centers. Clust. Comput. 22, 635–647 (2018)

    Article  Google Scholar 

  2. Schroeder, T., Goddard, S., Ramamurthy, B.: Scalable web server clustering technologies. IEEE Network 14, 38–45 (2000)

    Article  Google Scholar 

  3. Cardellini, V., Casalicchio, E., Colajanni, M., Yu, PhS: The state of the art in locally distributed web-server systems. ACM Comput. Surveys (CSUR) 34(2), 263–311 (2002)

    Article  Google Scholar 

  4. Andreolini, M., Casalicchio, E.: A cluster-based web system providing differentiated and guaranteed services. Clust. Comput. 7(1), 7–19 (2004)

    Article  Google Scholar 

  5. Hellerstein, J.L., Katircioglu, K., Surendra, M.: An on-line, business-oriented optimization of performance and availability for utility computing. IEEE J. Sel. Areas Commun. 23(10), 2013–2021 (2005)

    Article  Google Scholar 

  6. Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V., Fawn: a fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP’09, pp. 1–14 (2009)

  7. Chase, J.S., Anderson, D.C., Thakar, P.N., Vahdat, A.M., Doyle, R.P.: Managing energy and server resources in hosting centers. In: Proceedings of the eighteenth ACM symposium on Operating systems principles, vol. 35, pp. 103–116 (2001)

  8. NVIDIA, Tesla K20 GPU accelerator board specification. http://www.nvidia.com/content/PDF/kepler/Tesla-K20-Passive-BD-06455-001-v05.pdf

  9. Intel Corp, Intel Xeon Phi coprocessor. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html

  10. Zhang C., Li P., Sun G, et al.: Optimizing fpga based accelerator design for deep convolutional neural networks[C] Acm/sigda International Symposium on Field-programmable Gate Arrays. ACM (2015)

  11. Agrawal, S.R., Pistol, V., Pang, J., et al: Rhythm: harnessing data parallel hardware for server workloads, in: International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, pp. 19–34 (2014)

  12. Fjalling, T., Stenstrom, P., Performance impact of batching web-application requests using hot-spot processing on GPUs. In: Parallel and distributed processing symposium IEEE, pp. 989–999 (2015)

  13. Putnam, A., Gray, J., Haselman, M., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59(11), 114–122 (2016)

    Article  Google Scholar 

  14. Lim, R., Lee, Y., Kim, R., et al.: An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512. Clust. Comput. 21, 1785–1795 (2018)

    Article  Google Scholar 

  15. Kao, C.C., Hsu, W.C.: Exploring hidden coherency of ray-tracing for heterogeneous systems using online feedback methodology. Vis. Comput. 34, 633–643 (2017)

    Article  Google Scholar 

  16. Sharifian, S., Motamedi, S.A., Akbari, M.K.: A content-based load balancing algorithm with admission control for cluster web servers. Future Gener. Comput. Syst. 24(8), 775–787 (2008)

    Article  Google Scholar 

  17. Reisizadeh, A., Prakash, S., Pedarsani R., et al: Coded computation over heterogeneous clusters. In: Information theory (ISIT) 2017 IEEE international symposium on, ISIT 2017, pp. 2408–2412 (2017)

  18. Potluri, S., Bureddy, D., Hamidouche K., et al: MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters, in: High PERFORMANCE computing, networking, storage and analysis, IEEE, pp.1–11 (2013)

  19. Lu, M., Zhang, L., Huynh H.P., et al: Optimizing the mapreduce framework on intel xeon phi coprocessor. In: Big Data, 2013 IEEE international conference on, IEEE, pp. 125–130 (2013)

  20. Jha, S., He, B., Lu, M., et al.: Improving main memory hash joins on intel xeon phi processors: an experimental approach. Proc. VLDB Endowment 8(6), 642–653 (2015)

    Article  Google Scholar 

  21. Lima, J.V.F., Broquedis, F., Gautier, T. et al: Preliminary experiments with xkaapi on intel xeon phi coprocessor. In: Computer architecture and high performance computing (SBAC-PAD) 25th international symposium on. IEEE, pp. 105–112 (2013)

  22. Hou, K., Wang, H., Feng, W.: Aspas: a framework for automatic simdization of parallel sorting on x86-based many-core processors, In: Proceedings of the 29th ACM on international conference on supercomputing, ACM, pp. 383–392 (2015)

  23. Von Behren, J.R., Condit, J., Brewer, E.A.: Why events are a bad idea (for high-concurrency servers, In: HotOS, pp. 19–24 (2003)

  24. Pariag, D., Brecht, T., Harji, A., et al.: Comparing the performance of web server architectures. ACM SIGOPS Operat. Syst. Rev. 41(3), 231–243 (2007)

    Article  Google Scholar 

  25. Crovella, M.E., Frangioso, R., Harchol-Balter, M., Connection scheduling in web servers, Boston University Computer Science Department (1999)

  26. Liu, W.L., Lung, C.H., Ajila, S.: Impact of aspect-oriented programming on software performance: a case study of leader/followers and half-sync/half-async architectures. In: Computer software and applications conference 2011 IEEE 35th annual, COMPSAC 2011, pp. 662–667 (2011)

  27. Reese, W.: Nginx: the high-performance web server and reverse proxy. Linux Journal 173, 2 (2008)

    Google Scholar 

  28. Hu, Y., Nanda, A., Yang, Q.: Measurement, analysis and performance improvement of the Apache web server, In: Performance, computing and communications conference IEEE international, pp. 261–267 (1999)

  29. Vukotic, A., Goodwill, J.: Apache Tomcat 7. Apress, New York (2011)

    Book  Google Scholar 

  30. Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable Internet services. In: Proceedings of the eighteenth symposium on operating systems principles, Banff, ACM, pp. 230–243 (2001)

  31. Choi, G.S., Das, C.R.: A superscalar software architecture model for multi-core processors. J. Syst. Software 83, 1823–1837 (2010)

    Article  Google Scholar 

  32. Guo, D., Bhuyan, L.N., Liu, B.: An efficient parallelized L7-filter design for multicore servers. IEEE/ACM Transact. Netw. 20(5), 1426–1439 (2012)

    Article  Google Scholar 

  33. Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, N.: An analysis of linux scalability to many cores. In: Proceedings of the 9th USENIX conference on operating systems design and implementation, OSDI’10, Berkeley, USA, pp. 1–8 (2010)

  34. Harji, A.S., Buhr, P.A., Brecht, T.: Comparing high-performance multi-core web-server architectures. In: Proceedings of the 5th annual international systems and storage conference, SYSTOR’12, New York, USA, pp. 1-12 (2012)

  35. Hashemian, R., Krishnamurthy, D., Arlitt, M., Carlsson, N.: Characterizing the scalability of a web application on a multi-core server. Concurr. Comput. 26, 2027–2052 (2014)

    Article  Google Scholar 

  36. You, G., Zhao, Y.: A weighted-fair-queuing (WFQ)-based dynamic request scheduling approach in a multi-core system. Future Gener. Comput. Syst. 28, 1110–1120 (2012)

    Article  Google Scholar 

  37. You, G., Wang, X., Zhao, Y.: An adaptive dynamic request scheduling model for multi-socket, Multi-core Web Servers. Arab. J. Sci. Eng. 42, 751–764 (2016)

    Article  Google Scholar 

  38. Sharifian, S., Motamedi, S.A., Akbari, M.K.: A predictive and probabilistic load-balancing algorithm for cluster-based web servers. Appl. Soft. Comput. 11, 970–981 (2011)

    Article  Google Scholar 

  39. Uiseok, S., Bodon, J., Sungyong, P., et al.: Optimizing communication performance in scale-out storage system. Clust. Comput. 22, 335–346 (2018)

    Google Scholar 

  40. Gammo, L., Brecht, T., Shukla A., et al: Comparing and evaluating epoll, select, and poll event mechanisms. In: Proceedings of annual linux symposium (2004)

  41. Borhani, A.H., Hung, T., Lee, B.S., et al.: Power-network aware VM migration heuristics for multi-tier web applications. Clust. Comput. (2018). https://doi.org/10.1007/s10586-018-2872-x

    Article  Google Scholar 

  42. Hernández-Orallo, E., Vila-Carbó, J.: Web server performance analysis using histogram workload models. Comput. Netw. 53, 2727–2739 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

This paper has been supported by the Fundamental Research Funds for the Central Universities (Grant No. PT1607), and CHEMCLOUDCOMPUTING@BUCT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guohua You.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

You, G., Wang, X. A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems. Cluster Comput 23, 2591–2608 (2020). https://doi.org/10.1007/s10586-019-03030-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-03030-z

Keywords

Navigation