Skip to main content
Log in

Block I/O Scheduling on Storage Servers of Distributed File Systems

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

This paper presents a new scheme of I/O scheduling on storage servers of distributed/parallel file systems, for yielding better I/O performance. To this end, we first analyze read/write requests in the I/O queue of storage server (we name them block I/Os), by using our proposed technique of horizontal partition. Then, all block requests are supposed to be divided into multiple groups, on the basis of their offsets. This is to say, all requests related to the same chunk file will be grouped together, and then be satisfied within the same time slot between opening and closing the target chunk file on the storage server. As a result, the time resulted by completing block I/O requests can be significantly decreased, because of less file operations on the corresponding chunk files at the low-level file systems of server machines. Furthermore, we introduce an algorithm to rate a priority for each group of block I/O requests, and then the storage server dispatches groups of I/Os by following the priority order. Consequently, the applications having higher I/O priorities, e.g. they have less I/O operations and small size of involved data, can finish at a earlier time. We implement a prototype of this server-side scheduling in the PARTE file system, to demonstrate the feasibility and applicability of the proposed scheme. Experimental results show that the newly proposed scheme can achieve better I/O bandwidth and less I/O time, compared with the strategy of First Come First Served, as well as other server-side I/O scheduling approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. HDFS: Hadoop Distributed File System. http://hadoop.apache.org/hdfs/

  2. IOzone Filesystem Benchmark. http://www.iozone.org

  3. Filesystem in Userspace. http://fuse.sourceforge.net/

  4. Moosefs: the best open source distributed file system. https://moosefs.com/index.html

  5. Bez, J., Boito, F., Schnorr, L., et al.: TWINS: server access coordination in the I/O forwarding layer. In: Proceedings of 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP ‘2017), pp. 116–123 (2017)

  6. Boito, F., Kassick, R., Navaux, P., Denneulin, Y.: Agios: application-guided I/O scheduling for parallel file systems. In: Proceedings of 2013 International Conference on Parallel and Distributed Systems (ICPADS’13), pp. 43–50 (2013)

  7. Boito, F.: Transversal I/O Scheduling: from applications to devices. University Grenoble Alpes, PhD thesis (2015)

    Google Scholar 

  8. Boito, F., Kassick, R., Navaux, P., et al.: Automatic I/O scheduling algorithm selection for parallel file systems. Concurr. Comput. Pract. Exper. 28(8), 2457–2472 (2016)

    Article  Google Scholar 

  9. Bryk, P., Malawski, M., Juve, G., et al.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. J. Grid Comput. 14(2), 359–378 (2016)

    Article  Google Scholar 

  10. Buyya, R., Cortes, T., Jin, H.: Overview of the MPIIO parallel I/O interface. Input/Output in Parallel and Distributed Computer Systems, pp. 127–146. Springer (1996)

  11. Chen, F, Majumdar, S.: Performance of parallel I/O scheduling strategies on a network of workstations. In: Proceedings of the 8th International Conference on Parallel and Distributed Systems (ICPADS ‘01) (2001)

  12. Haddad, I. F.: PVFS2: A parallel file system for linux clusters. Linux J. 2000(80es), 5 (2000)

    Google Scholar 

  13. Han, H., Shalf, J.: Using IOR to analyze the I/O performance for HPC platforms. Lawrence Berkeley National Laboratory (2007)

  14. Hu, Y., Long, X., Zhang, J., Xia, L.: I/O scheduling model of virtual machine based on multi-core dynamic partitioning. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC ‘10), pp. 142–154 (2010)

  15. Hwang, K., Shin, H.: Real-time disk scheduling based on urgent group and shortest seek time first. In: Proceedings of Fifth Euromicro Workshop on Real-Time Systems (RTSS ‘1993), pp. 124–130 (1993)

  16. Jain, N., Lakshmi, J.: PriDyn: enabling differentiated I/O services in cloud using dynamic priorities. IEEE Trans. Serv. Comput. 8(2), 212–224 (2015)

    Article  Google Scholar 

  17. Kim, H., Lim, H., Jeong, J., Lee, J.: Task-aware virtual machine scheduling for I/O performance. In: Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ‘09), pp. 101–110 (2009)

  18. Kim, W., Srivastava, J.: Enhancing real-time DBMS performance with multiversion data and priority based disk scheduling. In: Proceedings of IEEE 12th Real-Time Systems Symposium (RTSS ‘91), pp. 222–231 (1991)

  19. Kim, S., Kim, H., Lee, J., Jeong, J.: Enlightening the I/O path a holistic approach for application performance. In: Proceedings of 15th USENIX Conference on File and Storage Technologies (FAST ‘17), pp. 345–358 (2017)

  20. Kotz, D.: Disk-directed I/O for MIMD multiprocessors. ACM Trans. Comput. Syst. 15(1), 41–74 (1997)

    Article  Google Scholar 

  21. Lebre, A., Huard, G., Denneulin, Y., et al.: I/O scheduling service for multi-application clusters. In: Proceedings of IEEE International Conference on CLUSTER Computing (Cluster ‘2006), pp. 1–10 (2006)

  22. Liao, J., Li, L., Chen, H., Liu, X.: Adaptive replica synchronization for distributed file systems. IEEE Syst. J. 9(3), 865–877 (2015)

    Article  Google Scholar 

  23. Liao, J., Trahay, F., Gerofi, B., Ishikawa, Y.: Prefetching on storage servers through mining access patterns on blocks. IEEE Trans. Parallel Distrib. Syst. 27(9), 2698–2710 (2016)

    Article  Google Scholar 

  24. Liu, Y., Huang, X., Huang, Y., Geng, S., Peng, X., Li, R.: A variable-sized stripe level data layout strategy for HDD/ SSD hybrid parallel file systems. Concurrency Computat.: Pract. Exper. (2016). https://doi.org/10.1002/cpe.4039

  25. Pruhs, K., Sgall, J., Torng, E.: Online scheduling. In: Hanbook of Scheduling, chapter 15. CRC Press (2004)

  26. Qian, Y., Barton, E., Wang, T., Puntambekar, N., Dilger, A.: A novel network request scheduler for a large scale storage system. Comput. Sci. - Res. Develop. 23, 143–148 (2009)

    Article  Google Scholar 

  27. Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the Seventh Symposium on Frontiers of Massively Parallel Computation (Frontiers’ 99), pp. 182–189 (1999)

  28. Riska, A., Larkby-Lahet, J., Riedel, E.: Evaluating block-level optimization through the IO path. In: Proceedings of 2007 USENIX Annual Technical Conference (ATC ‘2007), pp. 247–260 (2007)

  29. Ross, R. B., Ligon, W.B III: Server-side scheduling in cluster parallel I/O systems calculateurs paralleles special issue on parallel I/O for cluster computing (2001)

  30. Seamons, K. E., Chen, Y., Jones, P., Jozwiak, J., Winslett, M.: Server-directed collective I/O in Panda. In: Supercomputing ‘95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p 57. ACM, New York (1995)

  31. Seltzer, M., Chen, P., Ousterhout, J.: Disk scheduling revisited. In: Proceedings of the Winter 1990 USENIX Technical Conference, pp. 313–323 (1990)

  32. Song, H., Yin, Y., Sun, X. -H., Thakur, R., Lang, S.: Server-side: I/o coordination for parallel file systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC ‘11), pp. 1–11 (2011)

  33. Sudarsan, R., Calvin, R.: Combining performance and priority for scheduling resizable parallel applications. J. Parallel Distrib. Comput. 87, 55–66 (2016)

    Article  Google Scholar 

  34. Tan, Z., Du, L., Feng, D., EML, W. Zhou.: An I/O scheduling algorithm in large-scale-application environments. Fut. Gen. Comput. Syst. (2017)

  35. Tatebe, O., Hiraga, K., Soda, N.: Gfarm grid file system. N. Gener. Comput. 28(3), 257–275 (2010)

    Article  MATH  Google Scholar 

  36. Wachs, M, Abd-El-Malek, M., Thereska, E., Ganger, G.: Argon: performance insulation for shared storage servers. In: Proceedings of 5th USENIX Conference on File and Storage Technologies (FAST ‘07) (2007)

  37. Wang, H., Varman, P.: Balancing fairness and efficiency in tiered storage systems with bottleneck-aware allocation. In: Proceedings of 12th USENIX Conference on File and Storage Technologies (FAST ‘14), pp. 229–242 (2014)

  38. Waters, F.: AIX Performance Tuning Guide, chapter 8. Prentice Hall (1994)

  39. Wei, X., Li, W., Tatebe, O., Xu, G., Hu, L., Ju, J.: Integrating local job scheduler–LSF with Gfarm. In: Proceedings of third International Symposium on Parallel and Distributed Processing and Applications (ISPA ‘2005), pp. 196–204 (2005)

  40. Wei, X., Li, W., Tatebe, O., Xu, G., Hu, L., Ju, J.: Implementing data aware scheduling in Gfarm (R) using LSF (TM) scheduler plugin mechanism. In: Proceedings of the 2005 International Conference on Grid Computing and Applications (GCA ‘2005), pp. 3–10 (2005)

  41. Worthington, B., Ganger, G., Patt, Y.: Scheduling algorithms for modern disk drives. In: Proceedings of 22nd ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ‘94), pp. 241–251 (1994)

  42. Yang, S, Harter, T, Agrawal, N, Arpaci-Dusseau, R, et al.: Split-level I/O scheduling. In: Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ‘15), pp 474–489. ACM, New York (2015)

  43. Yang, T., Liu, T., Berger, E., Kaplan, S., Moss, J.: Redline: First class support for interactivity in commodity operating systems. In: Proceedings of 8th USENIX Symposium on Operating Systems Design and Implementation(OSDI ‘08), pp. 73-86. San Diego (2008)

  44. Yildiz, O., Dorier, M., Ibrahim, S., Ross, R., Antoniu, G.: On the root causes of cross-application I/O interference in HPC storage systems. In: Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS ‘2016), pp. 750–759 (2016)

  45. Zhang, X, Davis, K., Jiang, S.: IOrchestrator: Improving the performance of multi-node I/O systems via interserver coordination. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC ‘10), pp. 1—11 (2010)

  46. Zhang, X., Davis, K., Jiang, S.: QoS support for end users of I/O-intensive applications using shared storage systems. In: 2011 Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC ‘11), Article 18, 12 pages (2011)

Download references

Acknowledgments

This work was partially supported by “National Natural Science Foundation of China (No. 61303038)”, “Fundamental Research Funds for the Central Universities (No. XDJK2017B044)”, and “the Opening Project of State Key Laboratory for Novel Software Technology (No. KFKT2016B05)”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoning Peng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, J., Yin, D. & Peng, X. Block I/O Scheduling on Storage Servers of Distributed File Systems. J Grid Computing 16, 299–316 (2018). https://doi.org/10.1007/s10723-017-9423-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-017-9423-1

Keywords

Navigation