Abstract
Reliable performance evaluations require representative workloads. This has led to the use of accounting logs from production systems as a source for workload data in simulations. But using such logs directly suffers from various deficiencies, such as providing data about only one specific situation, and lack of flexibility, namely the inability to adjust the workload as needed. Creating workload models solves some of these problems but creates others, most notably the danger of missing out on important details that were not recognized in advance, and therefore not included in the model. Resampling solves many of these deficiencies by combining the best of both worlds. It is based on partitioning real workloads into basic components (specifically the job streams contributed by different users), and then generating new workloads by sampling from this pool of basic components. The generated workloads are adjusted dynamically to the conditions of the simulated system using a feedback loop, which may change the throughput. Using this methodology analysts can create multiple varied (but related) workloads from the same original log, all the time retaining much of the structure that exists in the original workload. Resampling with feedback thus provides a new way to use workload logs which benefits from the realism of logs while eliminating many of their drawbacks. In addition, it enables evaluations of throughput effects that are impossible with static workloads.
This paper reflects a keynote address at JSSPP 2021, and provides more details than a previous version from a keynote at Euro-Par 2016 [18]. It summarizes my and my students’ work and reflects a personal view. The goal is to show the big picture and the building and interplay of ideas, at the possible expense of not providing a full overview of and comparison with related work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chapin, S.J., et al.: Benchmarks and standards for the evaluation of parallel job schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 67–90. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_4
Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization, pp. 140–148 (2001). https://doi.org/10.1109/WWC.2001.990753
Denning, P.J.: Performance analysis: experimental computer science at its best. Comm. ACM 24(11), 725–727 (1981). https://doi.org/10.1145/358790.358791
Downey, A.B.: A parallel workload model and its implications for processor allocation. Cluster Comput. 1(1), 133–145 (1998). https://doi.org/10.1023/A:1019077214124
Downey, A.B., Feitelson, D.G.: The elusive goal of workload characterization. Perform. Eval. Rev. 26(4), 14–29 (1999). https://doi.org/10.1145/309746.309750
Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Statist. 7(1), 1–26 (1979). https://doi.org/10.1214/aos/1176344552
Efron, B., Gong, G.: A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37(1), 36–48 (1983). https://doi.org/10.2307/2685844
Feitelson, D.G.: Memory usage in the LANL CM-5 workload. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 78–94. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_17
Feitelson, D.G.: Metrics for parallel job scheduling and their convergence. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 188–205. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45540-X_11
Feitelson, D.G.: The forgotten factor: facts on performance evaluation and its dependence on workloads. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 49–60. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45706-2_4
Feitelson, D.G.: Workload modeling for performance evaluation. In: Calzarossa, M.C., Tucci, S. (eds.) Performance 2002. LNCS, vol. 2459, pp. 114–141. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45798-4_6
Feitelson, D.G.: Metric and workload effects on computer systems evaluation. Computer 36(9), 18–25 (2003). https://doi.org/10.1109/MC.2003.1231190
Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005). https://doi.org/10.1109/TPDS.2005.18
Feitelson, D.G.: Experimental computer science: the need for a cultural change (2005). http://www.cs.huji.ac.il/~feit/papers/exp05.pdf
Feitelson, D.G.: Locality of sampling and diversity in parallel system workloads. In: 21st International Conference Supercomputing, pp. 53–63 (2007). https://doi.org/10.1145/1274971.1274982
Feitelson, D.G.: Looking at data. In: 22nd IEEE International Symposium on Parallel and Distributed Processing (2008). https://doi.org/10.1109/IPDPS.2008.4536092
Feitelson, D.G.: Workload Modeling for Computer Systems Performance Evaluation. Cambridge University Press, Cambridge (2015)
Feitelson, D.G.: Resampling with feedback — a new paradigm of using workload data for performance evaluation. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 3–21. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_1
Feitelson, D.G., Mu’alem, A.W.: On the definition of “on-line’’ in job scheduling problems. SIGACT News 36(1), 122–131 (2005). https://doi.org/10.1145/1052796.1052797
Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th International Parallel Processing Symposium, pp. 542–546 (1998). https://doi.org/10.1109/IPPS.1998.669970
Feitelson, D.G., Naaman, M.: Self-tuning systems. IEEE Softw. 16(2), 52–60 (1999). https://doi.org/10.1109/52.754053
Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60153-8_38
Feitelson, D.G., Rudolph, L.: Distributed hierarchical control for parallel processing. Computer 23(5), 65–77 (1990). https://doi.org/10.1109/2.53356
Feitelson, D.G., Rudolph, L.: Evaluation of design choices for gang scheduling using distributed hierarchical control. J. Parallel Distrib. Comput. 35(1), 18–34 (1996). https://doi.org/10.1006/jpdc.1996.0064
Feitelson, D.G., Rudolph, L.: Metrics and benchmarking for parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1998. LNCS, vol. 1459, pp. 1–24. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0053978
Feitelson, D.G., Shmueli, E.: A case for conservative workload modeling: parallel job scheduling with daily cycles of activity. In: 17th Modelling, Analysis & Simulation of Computer and Telecommunication Systems (2009). https://doi.org/10.1109/MASCOT.2009.5366139
Feitelson, D.G., Tsafrir, D.: Workload sanitation for performance evaluation. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 221–230 (2006). https://doi.org/10.1109/ISPASS.2006.1620806
Feitelson, D.G., Tsafrir, D., Krakov, D.: Experience with using the parallel workloads archive. J. Parallel Distrib. Comput. 74(10), 2967–2982 (2014). https://doi.org/10.1016/j.jpdc.2014.06.013
Floyd, S., Paxson, V.: Difficulties in simulating the Internet. IEEE/ACM Trans. Netw. 9(4), 392–403 (2001). https://doi.org/10.1109/90.944338
Jann, J., Pattnaik, P., Franke, H., Wang, F., Skovira, J., Riordan, J.: Modeling of workload in MPPs. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 95–116. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_18
Lifka, D.A.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60153-8_35
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003). https://doi.org/10.1016/S0743-7315(03)00108-4
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001). https://doi.org/10.1109/71.932708
Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/
Prasad, R.S., Dovrolis, C.: Measuring the congestion responsiveness of internet traffic. In: Uhlig, S., Papagiannaki, K., Bonaventure, O. (eds.) PAM 2007. LNCS, vol. 4427, pp. 176–185. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71617-4_18
Schroeder, B., Harchol-Balter, M.: Web servers under overload: how scheduling can help. ACM Trans. Internet Technol. 6(1), 20–52 (2006)
Shmueli, E., Feitelson, D.G.: Using site-level modeling to evaluate the performance of parallel system schedulers. In: 14th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 167–176 (2006). https://doi.org/10.1109/MASCOTS.2006.50
Shmueli, E., Feitelson, D.G.: Uncovering the effect of system performance on user behavior from traces of parallel systems. In: 15th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 274–280 (2007). https://doi.org/10.1109/MASCOTS.2007.67
Shmueli, E., Feitelson, D.G.: On simulation and design of parallel-systems schedulers: are we doing the right thing? IEEE Trans. Parallel Distrib. Syst. 20(7), 983–996 (2009). https://doi.org/10.1109/TPDS.2008.152
Snir, M.: Computer and information science and engineering: one discipline, many specialties. Comm. ACM 54(3), 38–43 (2011). https://doi.org/10.1145/1897852.1897867
Talby, D., Feitelson, D.G., Raveh, A.: Comparing logs and models of parallel workloads using the co-plot method. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 43–66. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_3
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Modeling user runtime estimates. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 1–35. Springer, Heidelberg (2005). https://doi.org/10.1007/11605300_1
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007). https://doi.org/10.1109/TPDS.2007.70606
Tsafrir, D., Feitelson, D.G.: Instability in parallel job scheduling simulation: the role of workload flurries. In: 20th International Parallel & Distributed Processing Symposium (2006). https://doi.org/10.1109/IPDPS.2006.1639311
Tsafrir, D., Feitelson, D.G.: The dynamics of backfilling: solving the mystery of why increased inaccuracy may help. In: IEEE International Symposium on Workload Characterization, pp. 131–141 (2006). https://doi.org/10.1109/IISWC.2006.302737
Tsafrir, D., Ouaknine, K., Feitelson, D.G.: Reducing performance evaluation sensitivity and variability by input shaking. In: 15th Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 231–237 (2007). https://doi.org/10.1109/MASCOTS.2007.58
Willinger, W., Taqqu, M.S., Sherman, R., Wilson, D.V.: Self-similarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level. In: ACM SIGCOMM Conference, pp. 100–113 (1995)
Zakay, N., Feitelson, D.G.: On identifying user session boundaries in parallel workload logs. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2012. LNCS, vol. 7698, pp. 216–234. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35867-8_12
Zakay, N., Feitelson, D.G.: Workload resampling for performance evaluation of parallel job schedulers. Concurr. Comput. Pract. Exp. 26(12), 2079–2105 (2014). https://doi.org/10.1002/cpe.3240
Zakay, N., Feitelson, D.G.: Preserving user behavior characteristics in trace-based simulation of parallel job scheduling. In: 22nd Modelling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 51–60 (2014). https://doi.org/10.1109/MASCOTS.2014.15
Zakay, N., Feitelson, D.G.: Semi-open trace based simulation for reliable evaluation of job throughput and user productivity. In: 7th IEEE International Conference on Cloud Computing Technology & Science, pp. 413–421 (2015). https://doi.org/10.1109/CloudCom.2015.35
Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: 8th International Symposium on High Performance Distributed Computing, pp. 236–243 (1999). https://doi.org/10.1109/HPDC.1999.805303
Acknowledgments
The work described here was by and large performed by several outstanding students, especially Edi Shmueli, Netanel Zakay, and Dan Tsafrir. Our work was supported by the Israel Science Foundation (grants no. 219/99 and 167/03) and the Ministry of Science and Technology, Israel.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Feitelson, D.G. (2021). Resampling with Feedback: A New Paradigm of Using Workload Data for Performance Evaluation. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2021. Lecture Notes in Computer Science(), vol 12985. Springer, Cham. https://doi.org/10.1007/978-3-030-88224-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-88224-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88223-5
Online ISBN: 978-3-030-88224-2
eBook Packages: Computer ScienceComputer Science (R0)