Skip to main content

Advertisement

Log in

Software-defined QoS for I/O in exascale computing

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Supercomputers’ capability is approaching the exascale level, which enables large computing systems to run more jobs concurrently. Since modern data-intensive scientific applications can sometimes produce millions of I/O requests per second, I/O systems always suffer from heavy workloads and impede the overall performance. How to allocate I/O resources and guarantee the QoS (Quality of Service) for each individual application is becoming an increasingly important question. In this paper, we propose SDQoS, a software-defined QoS framework with the token bucket algorithm, aiming to meet the I/O requirements of concurrent applications contending for the I/O resources and improve the overall performance of the I/O systems. Evaluation shows that SDQoS can effectively control the I/O bandwidth within a 5%–10% deviation and improve the performance by 20% in extreme cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Altmann, J., Daanen, H., Oliver, H., Suarez, A.-B.: How to market-manage a qos network. In: Proceedings of 21st annual joint conference of the IEEE computer and communications societies (INFOCOM), vol. 1, pp. 284–293, IEEE, (2002)

  • Ali, N., Carns, P., Iskra, K., Kimpe, D., Lang, S., Latham, R., Sadayappan, P.: Scalable I/O forwarding framework for high-performance computing systems. In: Proceedings of IEEE international conference on cluster computing, pp. 1–10, IEEE, (2009)

  • Argonne national laboratory’s aurora system. https://www.intel.cn/content/www/cn/zh/high-performance-computing/intel-argonne-aurora-announcement-presentation.html. Accessed 10 Sept 2018

  • Bruno, J., Brustoloni, J., Gabber, E., Ozden, B., Silberschatz, A.: Disk scheduling with quality of service guarantees. In: Proceedings of IEEE international conference on multimedia computing and systems, vol. 2, pp. 400–405, IEEE, (1999)

  • Bent, J., Gibson, G., Grider, G., McClelland, B., Nowoczynski, P., Nunez, J., Polte, M., Wingate, M.: Plfs: a checkpoint filesystem for parallel applications. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, pp. 21, ACM, (2009)

  • Clark, D. D., Shenker, S., Zhang, L.: Supporting real-time applications in an integrated services packet network: Architecture and mechanism. In: Proceedings of the conference on communications architecture and protocols, pp. 14–26, ACM, (1992)

  • Carns, P., Latham, R., Ross, R., Iskra, K., Lang, S., Riley, K.: 24/7 characterization of petascale I/O workloads. In: Proceedings of IEEE international conference on cluster computing, pp. 1–10, IEEE, (2009)

  • Dorier, M., Antoniu, G., Ross, R., Kimpe, D., Ibrahim, S.: CALCioM: mitigating I/O interference in HPC systems through cross-application coordination. In: Proceedings of 28th IEEE international parallel and distributed processing symposium, pp. 155–164, IEEE, (2014)

  • Dongarra, J., Meuer, H., Strohmaier, E.: Top500 supercomputing sites. http://www.top500.org. Accessed 10 Sept 2018

  • Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F., Zhao, W., Yin, X., Hou, C., Zhang, C., Ge, W., Zhang, J., Wang, Y., Zhou, C., Yang, G.: The sunway taihulight supercomputer: system and applications. Sci. China Inform. Sci. 59(7), 072001 (2016)

    Article  Google Scholar 

  • Gildfind, A. J. A., McDonell, K. J.: “Method for empirically determining a qualified bandwidth of file storage for a shared filed system,” Sept. 15 . US Patent 7,590,775 (2009)

  • Gracia-Tinedo, R., Sampé, J., Zamora, E., Sánchez-Artigas, M., García-López, P., Moatti, Y., Rom, E.: Crystal: software-defined storage for multi-tenant object stores. In: Proceedings of the 15th Usenix conference on file and storage technologies, pp. 243–256, USENIX Association, (2017)

  • Houngbadji, T., Pierre, S.: Qosnet: an integrated qos network for routing protocols in large scale wireless sensor networks. Comput. Commun. 33(11), 1334–1342 (2010)

    Article  Google Scholar 

  • Kougkas, A., Dorier, M., Latham, R., Ross, R., Sun, X. H.: Leveraging burst buffer coordination to prevent I/O interference. In: Proceedings of 12th IEEE international conference on e-science, pp. 371–380, IEEE, (2016)

  • Liu, N., Cope, J., Carns, P. Carothers, C., Ross, R., Grider, G., Crume, A., Maltzahn, C.: On the role of burst buffers in leadership-class storage systems, In: Proceedings of 28th IEEE symposium on mass storage systems and technologies (MSST), pp. 1–11, IEEE, (2012)

  • Qian, Y., Li, X., Ihara, S., Zeng, L., Kaiser, J., Süß, T., Brinkmann, A.: A configurable rule based classful token bucket filter network request scheduler for the lustre file system. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, pp. 6, ACM, (2017)

  • Qu, H., Mashayekhi, O., Shah, C., Levis, P.: Decoupling the control plane from program control flow for flexibility and performance in cloud computing. In: Proceedings of the thirteenth euroSys conference, pp. 1, ACM, (2018)

  • Rajachandrasekar, R., Jaswani, J., Subramoni, H., Panda, D. K.: Minimizing network contention in infiniband clusters with a qos-aware data-staging framework. In: Proceedings of IEEE international conference on cluster computing, pp. 329–336, IEEE, (2012)

  • Schwan, P.: Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 linux symposium, vol. 2003, pp. 380–386, (2003)

  • Sahu, S., Nain, P., Diot, C., Firoiu, V., Towsley, D.: “On achievable service differentiation with token bucket marking for tcp,” In: Proceedings of ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 23–33, ACM, (2000)

  • Sharma, S., Katramatos, D., Yu, D., Shi, L.: Design and implementation of an intelligent end-to-end network QoS system. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, pp. 1–11, ACM, (2012)

  • Shi, X., Li, M., Liu, W., Jin, H., Yu, C., Chen, Y.: SSDUP: a traffic-aware ssd burst buffer for HPC systems. In: Proceedings of the international conference on supercomputing, pp. 27, ACM, (2017)

  • Stefanovici, I. A., Schroeder, B., O’Shea, G., Thereska, E.: sroute: treating the storage stack like a network. In: Proceedings of the 14th Usenix conference on file and storage technologies, pp. 197–212, USENIX association, (2016)

  • Tang, P. P., Tai, T. Y.: Network traffic characterization using token bucket model. In: Proceedings of 18th annual joint conference of the IEEE computer and communications societies (INFOCOM), pp. 51–62, IEEE, (1999)

  • Thereska, E., Ballani, H., O’Shea, G., Karagiannis, T., Rowstron, A., Talpey, T., Black, R., Zhu, T.: Ioflow: a software-defined storage architecture. In: Proceedings of the 24th ACM symposium on operating systems principles, pp. 182–196, ACM, (2013)

  • Thapaliya, S., Bangalore, P., Lofstead, J., Mohror, K., Moody, A.: Io-cop: managing concurrent accesses to shared parallel file system. In: Proceedings of 43rd IEEE international conference on parallel processing workshops (ICCPW), pp. 52–60, IEEE, (2014)

  • Vazhkudai, S., de Supinski, B., Bland, A., Geist, A., Sexton, J.,Kahle, J., Zimmer, C., Atchley, S., Oral, S., Maxwell, D., VergaraLarrea, V., Bertsch, A., Goldstone, R., Joubert, W., Chambreau, C.,Appelhans, D., Blackmore, R., Casses, B., Chochia, G., Davison, G.,Ezell, M., Gooding, T., Gonsiorowski, E., Grinberg, L., Hanson, B.,Hartner, B., Karlin, I., Leininger, M., Leverman, D., Marroquin, C.,Moody, A., Ohmacht, M., Pankajakshan, R., Pizzano, F., Rogers, J.,Rosenburg, B., Schmidt, D., Shankar, M., Wang, F., Watson, P.,Walkup, B., Weems, L., Yin, J.: The design, deployment, andevaluation of the coral pre-exascale systems. In Proceedings of the international conference on high performance computing, networking, storage and analysis, pp. 52, IEEE, (2018)

  • Wijayaratne, R., Reddy, A. N.: Integrated QOS management for disk I/O. In: Proceedings of IEEE international conference on multimedia computing and systems, vol. 1, pp. 487–492, IEEE, (1999)

  • Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th symposium on operating systems design and implementation, pp. 307–320, USENIX Association, (2006)

  • Wadhwa, B., Byna, S., Butt, A.R.: Toward transparent data management in multi-layer storage hierarchy of hpc systems, In: Proceedings of IEEE international conference on cloud engineering, pp. 211–217, IEEE, (2018)

  • Yildiz, O., Dorier, M., Ibrahim, S., Ross, R., Antoniu, G.: On the Root Causes of Cross-Application I/O Interference in HPC Storage Systems. In: Proceedings of IEEE international parallel and distributed processing dymposium, pp. 750–759, IEEE, (2016)

Download references

Acknowledgements

The work is supported by the National Key R&D Program of China(No. 2017YFC0803700), NSFC (No. 61772218, 61433019), and the Outstanding Youth Foundation of Hubei Province (No.2016CFA032).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuanhua Shi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hua, Y., Shi, X., Jin, H. et al. Software-defined QoS for I/O in exascale computing. CCF Trans. HPC 1, 49–59 (2019). https://doi.org/10.1007/s42514-019-00005-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-019-00005-9

Keywords

Navigation