Skip to main content
Log in

Scheduling data analytics work with performance guarantees: queuing and machine learning models in synergy

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In today’s scaled out systems, co-scheduling data analytics work with high priority user workloads is common as it utilizes better the vast hardware availability. User workloads are dominated by periodic patterns, with alternating periods of high and low utilization, creating promising conditions to schedule data analytics work during low activity periods. To this end, we show the effectiveness of machine learning models in accurately predicting user workload intensities, essentially by suggesting the most opportune time to co-schedule data analytics work. Yet, machine learning models cannot predict the effects of performance interference when co-scheduling is employed, as this constitutes a “new” observation. Specifically, in tiered storage systems, their hierarchical design makes performance interference even more complex, thus accurate performance prediction is more challenging. Here, we quantify the unknown performance effects of workload co-scheduling by enhancing machine learning models with queuing theory ones to develop a hybrid approach that can accurately predict performance and guide scheduling decisions in a tiered storage system. Using traces from commercial systems we illustrate that queuing theory and machine learning models can be used in synergy to surpass their respective weaknesses and deliver robust co-scheduling solutions that achieve high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. The Wikipedia tracesare publicly available [4]. Due of confidentiality agreements, the storage system trace or provider details can not be made publicly available.

  2. We assume \(t_w = 1\) min in our experimental evaluation, but this could be adjusted according to the specific system requirement.

  3. For presentation reasons, we use a 2-tiered storage system to explain our methodology, but it could be easily extended to storage systems with more tiers. This discussion also applies to caching.

References

  1. Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. CIDR 11, 261–272 (2011)

    Google Scholar 

  2. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: new analysis practices for big data. Proc. VLDB Endow. 2(2), 1481–1492 (2009)

    Article  Google Scholar 

  3. Zhuang, Z., Ramachandra, H., Tran, C., Subramaniam, S., Botev, C., Xiong, C., Sridharan, B.: Capacity planning and headroom analysis for taming database replication latency: experiences with linkedin internet traffic. In: Proceedings of the 6th ACM/SPEC ICPE, pp. 39–50 (2015)

  4. Urdaneta, G., Pierre, G., van Steen, M.: Wikipedia workload analysis for decentralized hosting. Elsevier Comput. Netw. 53(11), 1830–1845 (2009)

    Article  Google Scholar 

  5. Xue, J., Yan, F., Riska, A., Smirni, E.: Storage workload isolation via tier warming: how models can help. In: Proceedings of the 11th ICAC, pp. 1–11 (2014)

  6. Peters, M.: 3PAR: optimizing I/O service levels. ESG White Paper (2010)

  7. Laliberte, B.: Automate and optimize a tiered storage environment—FAST! ESG White Paper (2009)

  8. Amazon ElastiCache. http://aws.amazon.com/elasticache. Accessed 11 Mar 2015

  9. Guerra, J., Pucha, H., Glider, J.S., Belluomini, W., Rangaswami, R.: Cost effective storage using extent based dynamic tiering. In: FAST, pp. 273–286 (2011)

  10. Oh, Y., Choi, J., Lee, D., Noh, S.H.: Caching less for better performance: balancing cache size and update cost of flash memory cache in hybrid storage systems. In: FAST, pp. 313–326 (2012)

  11. FIO Benchmark. http://www.freecode.com/projects/fio. Accessed 11 Mar 2015

  12. Bjorkqvist, M., Chen, L.Y., Binder, W.: Opportunistic service provisioning in the cloud. In: 5th IEEE CLOUD, pp. 237–244 (2012)

  13. Ansaloni, D., Chen, L.Y., Smirni, E., Binder, W.: Model-driven consolidation of java workloads on multicores. In: 42nd IEEE/IFIP DSN, pp. 1–12 (2012)

  14. Birke, R., Björkqvist, M., Chen, L.Y., Smirni, E., Engbersen, T.: (Big)data in a virtualized world: volume, velocity, and variety in cloud datacenters. In: FAST, pp. 177–189 (2014)

  15. Leemis, L.M., Park, S.K.: Discrete-Event Simulation: A First Course. Pearson Prentice Hall, Upper Saddle River (2006)

    Google Scholar 

  16. George, B.: Time Series Analysis: Forecasting & Control, 3rd edn. Pearson Education India, Gurgaon (1994)

    MATH  Google Scholar 

  17. Goodwin, P.: The holt-winters approach to exponential smoothing: 50 years old and going strong. In: Foresight, pp. 30–34 (2010)

  18. Frank, R.J., Davey, N., Hunt, S.P.: Time series prediction and neural networks. J. Intell. Robot. Syst. 31(1–3), 91–103 (2001)

    Article  MATH  Google Scholar 

  19. Hassoun, M.H.: Fundamentals of Artificial Neural Networks, 1st edn. MIT Press, Cambridge (1995)

    MATH  Google Scholar 

  20. Hill, T., O’Connor, M., Remus, W.: Neural network models for time series forecasts. Manag. Sci. 42(7), 1082–1092 (1996)

  21. Demuth, H., Beale, M., Hagan, M.: Neural network toolbox\(^{TM}\) 6, User Guide

  22. Stokely, M., Mehrabian, A., Albrecht, C., Labelle, F., Merchant, A.: Projecting disk usage based on historical trends in a cloud environment. In: Proceedings of the 3rd Workshop on ScienceCloud, pp. 63–70 (2012)

  23. Ross, S.M.: Introduction to Probability and Statistics for Engineers and Scientists. Academic Press, Cambridge (2009)

  24. Tijms, H.C.: A first course in stochastic models. Wiley, New York (2003)

    Book  MATH  Google Scholar 

  25. Lim, H.C., Babu, S., Chase, J.S.: Automated control for elastic storage. In: Proceedings of the 7th ICAC. ACM, pp. 1–10 (2010)

  26. Cucinotta, T., Checconi, F., Abeni, L., Palopoli, L.: Self-tuning schedulers for legacy real-time applications. In: Proceedings of the 5th EuroSys, pp. 55–68 (2010)

  27. Ferrer, A.J., HernáNdez, F., Tordsson, J., Elmroth, E., Ali-Eldin, A., Zsigri, C., Sirvent, R., Guitart, J., Badia, R.M., Djemame, K., et al.: Optimis: a holistic approach to cloud service provisioning. Futur. Gener Comput. Syst. 28, 66–77 (2012)

  28. Singh, R., Shenoy, P., Natu, M., Sadaphal, V., Vin, H.: Analytical modeling for what-if analysis in complex cloud computing applications. ACM SIGMETRICS Perform. Eval. Rev. 40(4), 53–62 (2013)

    Article  Google Scholar 

  29. Zhang, Q., Cherkasova, L., Smirni, E.: A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: Proceedings of the 4th ICAC, pp. 27–36 (2007)

  30. Yan, F., Riska, A., Smirni, E.: Busy bee: how to use traffic information for better scheduling of background tasks. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, pp. 145–156 (2012)

  31. Cortez, P., Rio, M., Rocha, M., Sousa, P.: Multi-scale internet traffic forecasting using neural networks and time series methods. Expert Syst. 29(2), 143–155 (2012)

    Google Scholar 

  32. Li, J., Moore, A.W.: Forecasting web page views: methods and observations. J. Mach. Learn. Res. 9(10), 2217–2250 (2008)

    MATH  Google Scholar 

  33. Couceiro, M., Romano, P., Rodrigues, L.: A machine learning approach to performance prediction of total order broadcast protocols. In: 4th IEEE SASO, pp. 184–193 (2010)

  34. Didona, D., Quaglia, F., Romano, P., Torre, E.: Enhancing performance prediction robustness by combining analytical modeling and machine learning. In: Proceedings of the 6th ACM/SPEC ICPE, pp.145–156 (2015)

Download references

Acknowledgments

This work is supported by NSF Grant CCF-1218758.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Yan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, J., Yan, F., Riska, A. et al. Scheduling data analytics work with performance guarantees: queuing and machine learning models in synergy. Cluster Comput 19, 849–864 (2016). https://doi.org/10.1007/s10586-016-0563-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0563-z

Keywords

Navigation