Skip to main content
Log in

Autonomic workload performance tuning in large-scale data repositories

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The workload in large-scale data repositories involves concurrent users and contains homogenous and heterogeneous data. The large volume of data, dynamic behavior and versatility of large-scale data repositories is not easy to be managed by humans. This requires computational power for managing the load of current servers. Autonomic technology can support predicting the workload type; decision support system or online transaction processing can help servers to autonomously adapt to the workloads. The intelligent system could be designed by knowing the type of workload in advance and predict the performance of workload that could autonomically adapt the changing behavior of workload. Workload management involves effectively monitoring and controlling the workflow of queries in large-scale data repositories. This work presents a taxonomy through systematic analysis of workload management in large-scale data repositories with respect to autonomic computing (AC) including database management systems and data warehouses. The state-of-the-art practices in large-scale data repositories are reviewed with respect to AC for characterization, performance prediction and adaptation of workload. Current issues are highlighted at the end with future directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Abbreviations

DBMS:

Database management system

ADBMS:

Autonomic database management system

AWPT:

Autonomic workload performance tuning

OLAP:

Online analytical processing

OLTP:

Online transaction processing

KCCA:

Kernel canonical correlation analysis

TPC:

Transaction Processing Council

DBA:

Database administrator

SVM:

Support vector machines

QEP:

Query execution plan

AC:

Autonomic computing

QoS:

Quality of service

KNN:

K-nearest neighbor

OSN:

Online social network

CBMG:

Customer behavior model graph

GC:

Garbage collection

CRT:

Classification and regression tree

BI:

Business intelligence

PCA:

Principal component analysis

CCA:

Canonical correlation analysis

QP:

Query patroller

PQR:

Predictions of query runtime

SLA:

Service level agreement

EQMS:

External queue management system

WCF:

Workload classification and forecasting

MAPEK:

Monitor, Analyze, Plan, Execute, Knowledge

DML:

Descartes Modeling Language

ANN:

Artificial neural network

References

  1. Abadi M et al (2016). TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467

  2. Abdul M, Muhammad AM, Mustapha N, Muhammad S, Ahmad N (2014) Database workload management through CBR and fuzzy based characterization. Appl Soft Comput 22:605–621

    Article  Google Scholar 

  3. Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow 2(1):922–933

    Article  Google Scholar 

  4. Agrawal S, Chaudhuri S, Kollar L, Marathe A, Narasayya, V, Syamala M (2005) Database tuning advisor for microsoft SQL server, In: The proceeding of the 30th VLDB conference, pp 1110–1121

  5. Akdere M, Cetintemel U, Riondato M, Upfal E, Zdonik SB (2012) Learning-based query performance modeling and prediction. In: IEEE 28th international conference on data engineering (ICDE), pp 390–401

  6. Alvarez GP, Chau WJ (2016) Scenario-aware workload characterization based on a max-plus linear representation. In: International conference on formal modeling and analysis of timed systems, Springer International Publishing, Berlin, pp 177–194

  7. Aly AM, Mahmood AR, Hassan MS, Aref WG, Ouzzani M, Elmeleegy H, Qadah T (2015) Aqwa: adaptive query workload aware partitioning of big spatial data. Proc VLDB Endow 8(13):2062–2073

    Article  Google Scholar 

  8. Aouiche K, Darmont J (2017) Index and materialized view selection in data warehouses. arXiv preprint arXiv:1701.08029

  9. Awad M, Menasc DA (2015) Automatic workload characterization using system log analysis. In: Computer measurement group conference on performance and capacity, San Antonio, TX, USA

  10. Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3(Jul):1–48

    MathSciNet  MATH  Google Scholar 

  11. Ballinger C (2002) Introduction to teradata’s priority scheduler, http://www.teradatalibrary.com/pdf/eb3092.pdf. Accessed 16 May 2018

  12. Benevenuto F, Rodrigues T, Cha M, Almeida V (2012) Characterizing user navigation and interactions in online social networks. Inf Sci 195:1–24

    Article  Google Scholar 

  13. Bernardini C, Silverston T, Festor O (2014) A pin is worth a thousand words: characterization of publications in pinterest. In: IEEE international conference on wireless communications and mobile computing (IWCMC), pp 322–327

  14. Bernstein PA, Das S, Ding B, Pilman M (2015) Optimizing optimistic concurrency control for tree-structured, log-structured databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1295–1309

  15. Bhattacharyya A, Hoefler T (2014) Pemogen: automatic adaptive performance modeling during program runtime. In: 23rd international conference on parallel architecture and compilation techniques (PACT), pp 393–404

  16. Bruno N, Chaudhuri S (2007) An online approach to physical design tuning. In: IEEE 23rd international conference on data engineering (ICDE), pp 826–835

  17. Calzarossa MC, Massari L (2011) Analysis of web logs: challenges and findings. In: Performance evaluation of computer and communication systems. Milestones and future challenges, Springer, Berlin, pp 227–239

  18. Calzarossa MC, Massari L, Tessera D (2016) Workload characterization: a survey revisited. ACM Comput Surv (CSUR) 48(3):48

    Article  Google Scholar 

  19. Calzarossa MC, Tessera D (2014) Multivariate analysis of web content changes. In: IEEE/ACS 11th international conference on computer systems and applications (AICCSA), pp 699–706

  20. Calzarossa MC, Tessera D (2015) Modeling and predicting temporal patterns of web content changes. J Netw Comput Appl 56:115–123

    Article  Google Scholar 

  21. Carbunar B, Potharaju R (2015) A longitudinal study of the Google app market. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 242–249

  22. Cha M, Benevenuto F, Ahn YY, Gummadi KP (2012) Delayed information cascades in Flickr: measurement, analysis, and modeling. Comput Netw 56(3):1066–1076

    Article  Google Scholar 

  23. Chandramouli B, Bond CN, Babu S, Yang J (2007) Query suspend and resume. In: ACM proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 557–568

  24. Chang X, Terpenny J (2009) Ontology-based data integration and decision support for product e-design. Robot Comput Integr Manuf 25(6):863–870

    Article  Google Scholar 

  25. Chaudhuri S, Kaushik R, Pol A, Ramamurthy R (2007) Stop-and-restart style execution for long running decision support queries. In: Proceedings of the 33rd international conference on very large data bases, VLDB endowment, pp 735–745

  26. Chaudhuri S, Weikum G (2000) Rethinking database system architecture: towards a self-tuning RISC-style database system. In: VLDB, pp 1–10

  27. Chen H, Chiang RH, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165–1188

    Article  Google Scholar 

  28. Cheng X, Liu J, Dale C (2013) Understanding the characteristics of internet short video sharing: a YouTube-based measurement study. IEEE Trans Multimed 15(5):1184–1194

    Article  Google Scholar 

  29. Chetsa T, Landry G, Lefevrem L, Stolf P (2014) A three step blind approach for improving high performance computing systems’ energy performance. Concurr Comput Pract Exp 26(15):2612–2629

    Article  Google Scholar 

  30. Chi C, Zhou Y, Ye X (2013) Performance prediction for performance-sensitive queries based on algorithmic complexity. Tsinghua Sci Technol 18(6):618–628

    Article  MATH  Google Scholar 

  31. Chiba T, Onodera T (2016) Workload characterization and optimization of TPC-H queries on Apache Spark. In: IEEE international symposium on performance analysis of systems and software (ISPASS), pp 112–121

  32. Coker Z, Garlan D, Le Goues C (2015) SASS: self-adaptation using stochastic search. In: IEEE/ACM 10th international symposium on software engineering for adaptive and self-managing systems (SEAMS), pp 168–174

  33. Cyran M, Green CD (2001) Oracle 9i database performance guide and reference. Release 1(9.0): 1

  34. DB2 Query Patroller Guide: Installation, Administration and Usage (2003) IBM Corporation

  35. de Carvalho Costa RL, Furtado P (2015) Elections and reputation for high dependability and performance in distributed workload execution. IEEE Trans Parallel Distrib Syst 26(8):2233–2246

    Article  Google Scholar 

  36. Derakhshan R, Stantic B, Korn O, Dehne F (2008) Parallel simulated annealing for materialized view selection in data warehousing environments. Lect Notes Comput Sci 5022:121–132

    Article  Google Scholar 

  37. Diao Y, Hellerstein JL, Parekh S, Griffith R, Kaiser G, Phung D (2005) Self-managing systems: a control theory foundation. In: Proceedings of the 12th IEEE international conference and workshop on the engineering of computer-based systems, pp 441–448

  38. Didona D, Quaglia F, Romano P, Torre E (2015) Enhancing performance prediction robustness by combining analytical modeling and machine learning. In: Proceedings of the 6th ACM/SPEC international conference on performance engineering, pp 45–156

  39. Ding Z, Wei Z, Chen H (2017) A software cybernetics approach to self-tuning performance of on-line transaction processing systems. J Syst Softw 124:247–259

    Article  Google Scholar 

  40. Do TMT, Gatica-Perez D (2014) Where and what: using smartphones to predict next locations and applications in daily life. Pervasive Mob Comput 12:79–91

    Article  Google Scholar 

  41. Dona J, Ortega A, Holgado M (2016) Business intelligence strategy for data warehouse in andalusian health service. InImpact J Innov Impact 6(1):121

    Google Scholar 

  42. Duggan J, Chi Y, Hacigumus H, Zhu S, Cetintemel U (2013) Packing light: portable workload performance prediction for the cloud. In: IEEE 29th international conference on data engineering workshops (ICDEW), pp 258–265

  43. Duggan J, Papaemmanouil O, Cetintemel U, Upfal E (2014) Contender: a resource modeling approach for concurrent query performance prediction. In: EDBT, pp 109–120

  44. Elnaffar S (2002) A methodology for auto-recognizing DBMS workloads. In: Proceedings of the conference of the centre for advanced studies on collaborative research, IBM Press, p 2

  45. Elnaffar S, Martin P (2004) An intelligent framework for predicting shifts in the workloads of autonomic database management systems. In: Proceedings of IEEE international conference on advances in intelligent systems–theory and applications

  46. Elnaffar S, Martin P (2009) The psychic-skeptic prediction framework for effective monitoring of DBMS workloads. Data Knowl Eng 68(4):393–414

    Article  Google Scholar 

  47. Elnaffar S, Martin P, Horman R (2002) Automatically classifying database workloads. In: Proceeding of the ACM conference on Information and Knowledge management, pp 622–624

  48. Elnaffar S, Martin P, Schiefer B, Lightstone S (2008) Is it DSS or OLTP: automatically identifying DBMS workloads. J Intell Inf Syst 30(3):249–271

    Article  Google Scholar 

  49. Elnaffar S, Powley W, Benoit D, Martin P (2003) Today’s DBMSs: How autonomic are they? In: Proceedings of the 14th international workshop on database and expert systems applications, IEEE Computer Society, pp 651–655

  50. Elnikety S, Nahum E, Tracey J, Zwaenepoel W (2004) A method for transparent admission control and request scheduling in e-commerce web sites. In: ACM proceedings of the 13th international conference on World Wide Web, pp 276–286

  51. Fenacci D, Franke B, Thomson J (2010) Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining. In: Proceedings of the 13th ACM international workshop on software and compilers for embedded systems, p 5

  52. Figueiredo F, Almeida JM, Gonçalves MA, Benevenuto F (2014) On the dynamics of social media popularity: a YouTube case study. ACM Trans Internet Technol (TOIT) 14(4):24

    Article  Google Scholar 

  53. Florio L (2017) Design and management of distributed self-adaptive systems. Dissertation, Politecnico di Milano

  54. Ganapathi A, Kuno H, Dayal U, Wiener JL, Fox A, Jordan M, Patterson D (2009) Predicting multiple metrics for queries: better decisions enabled by machine learning. In: IEEE 25th international conference on data engineering (ICDE), pp 592–603

  55. Gates AF, Natkovich O, Chopra S, Kamath P, Narayanamurthy SM, Olston C, Reed B, Srinivasan S, Srivastava U (2009) Building a high-level dataflow system on top of Map-Reduce: the Pig experience. Proc VLDB Endow 2(2):1414–1425

    Article  Google Scholar 

  56. George J, Kumar V, Kumar S (2015) Data warehouse design considerations for a healthcare business intelligence system. In: World congress on engineering

  57. Gour V, Sarangdevot SS, Tanwar GS (2010) Performance tuning mechanisms for data warehouse: query cache. Int J Comput Appl 2(2):70–75

    Google Scholar 

  58. Grund M, Krüger J, Plattner H, Zeier A, Cudre-Mauroux P, Madden S (2010) HYRISE: a main memory hybrid storage engine. Proc VLDB Endow 4(2):105–116

    Article  Google Scholar 

  59. Gupta C, Mehta A, Dayal U (2008) PQR: predicting query execution times for autonomous workload management. In: International conference on autonomic computing (ICAC), pp 13–22

  60. Harbi R, Abdelaziz I, Kalnis P, Mamoulis N, Ebrahim Y, Sahli M (2016) Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J 25(3):355–380

    Article  Google Scholar 

  61. Hasan R (2014) Predicting SPARQL query performance and explaining linked data. In: European semantic web conference, Springer, Cham, pp 795–805

  62. Heinrich R, Jung R, Schmieders E, Metzger A, Hasselbring W, Reussner R, Pohl K (2015) Architectural run-time models for operator-in-the-loop adaptation of cloud applications. In: IEEE 9th international symposium on the maintenance and evolution of service-oriented and cloud-based environments (MESOCA), pp 36–40

  63. Herbst NR, Huber N, Kounev S, Amrehn E (2014) Self-adaptive workload classification and forecasting for proactive resource provisioning. Concurr Comput Pract Exp Wiley 26(12):2053–2078

    Article  Google Scholar 

  64. Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. CIDR 11(2011):261–272

    Google Scholar 

  65. Holze M, Ritter N (2008) Autonomic databases: detection of workload shifts with n-Gram-models. In: ADBIS, vol 8, pp 127–142

  66. Horzyk A, Dudek-Dyduch E (2005) Effectiveness of artificial neural networks adaptation according to time period of training data acquisition. In: Intelligent systems design and applications (ISDA), pp130–135

  67. Hsu WW, Smith AJ, Young HC (2001) Characteristics of production database workloads and the TPC benchmarks. IBM Syst J 40(3):781–802

    Article  Google Scholar 

  68. Huber N, Walter J, Bähr M, Kounev S (2015) Model-based autonomic and performance-aware system adaptation in heterogeneous resource environments: a case study. In: IEEE 2015 international conference on cloud and autonomic computing (ICCAC), pp 181–191

  69. Hurault A, Baek K, Casanova H (2015) Selecting linear algebra kernel composition using response time prediction. Softw Pract Exp 45(12):1659–1676

    Article  Google Scholar 

  70. IBM (2000) DB2 universal database version 7 administration guide: performance. IBM Corporation, New York

    Google Scholar 

  71. Jia Z, Zhan J, Wang L, Han R, McKee SA, Yang Q, Luo C, Li J (2014) Characterizing and subsetting big data workloads. In: IEEE international symposium on workload characterization (IISWC), pp. 191–201

  72. Keeton K, Patterson DA (2000) Towards a simplified database workload for computer architecture evaluations. In: Workload characterization for computer system design, Springer, USA, pp 49–71

  73. Kemper A, Neumann T (2011) HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: IEEE 27th international conference on data engineering (ICDE), pp 195–206

  74. Khanna R, Ganguli M, Narayan A, Abhiram R, Gupta P (2014) Autonomic characterization of workloads using workload fingerprinting. In: 2014 IEEE international conference on cloud computing in emerging markets (CCEM), pp 1–8

  75. Khattab A, Algergawy A, Sarhan A (2015) MAG: a performance evaluation framework for database systems. Knowl Based Syst 85:245–255

    Article  Google Scholar 

  76. Khoshkbarforoushha A, Ranjan R (2016) Resource and performance distribution prediction for large-scale analytics queries. In: Proceedings of the 7th ACM/SPEC on international conference on performance engineering, pp 49–54

  77. Koehler J, Giblin C, Gantenbein D, Hauser R (2003) On autonomic computing architectures. Research report, IBM Zurich Research Laboratory, Switzerland

  78. Lee S, Meredith JS, Vetter JS, (2015) Compass: a framework for automated performance modeling and prediction. In: Proceedings of the 29th ACM on international conference on supercomputing, pp 405–414

  79. Liao ZX, Pan YC, Peng WC, Lei PR (2013) On mining mobile apps usage behavior for predicting apps usage in smartphones. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 609–618

  80. Lightstone SS, Lohman G, Zilio D (2002) Toward autonomic computing with DB2 universal database. SIGMOD Rec 31(3):55–61

    Article  Google Scholar 

  81. Liu C, Liu C, Shang Y, Chen S, Cheng B, Chen J (2017) An adaptive prediction approach based on workload pattern discrimination in the cloud. J Netw Comput Appl 80:35–44

    Article  Google Scholar 

  82. Lu Y, Shanbhag A, Jindal A, Madden S (2017) AdaptDB: adaptive partitioning for distributed joins. Proc VLDB Endow 10(5):589–600

    Article  Google Scholar 

  83. Maghawry EA, Ismail RM, Badr NL, Tolba MF (2014) An enhanced queries scheduler for query processing over a cloud environment. In: IEEE 9th international conference on computer engineering and systems (ICCES), pp 409–414

  84. Mahanti A, Carlsson N, Mahanti A, Arlitt M, Williamson C (2013) A tale of the tails: power-laws in internet measurements. IEEE Netw 27(1):59–64

    Article  Google Scholar 

  85. Marcus R, Papaemmanouil O (2016) WiSeDB: a learning-based workload management advisor for cloud databases. Proc VLDB Endow 9(10):780–791

    Article  Google Scholar 

  86. Marcus R, Papaemmanouil O (2016) Workload management for cloud databases via machine learning. In: IEEE 32nd international conference on data engineering workshops (ICDEW), pp 27–30

  87. Huebscher MC, McCann JA (2008) A survey of autonomic computing—degrees, models, and applications. ACM Comput Surv 40(3):1–28

    Article  Google Scholar 

  88. Martin P, Elnaffar S, Wasserman T (2006) Workload models for autonomic database management systems. In: IEEE international conference on autonomic and autonomous systems (ICAS), p 10

  89. Mateen A, Raza B, Hussain T, Awais MM (2008) Autonomic computing in SQL server. In: IEEE/ACIS 7th international conference on computer and information science (ICIS), pp 113–118

  90. Mateen A, Raza B, Hussain T, Awais MM (2009) Autonomicity in universal database DB2. In: IEEE/ACIS international conference on computer and information science (ICIS), pp 445–450

  91. Mateen A, Raza B, Sher M et al (2014) Workload management: a technology perspective with respect to self-characteristics. Artif Intell Rev 41(4):463–489

    Article  Google Scholar 

  92. Medina JM, Barranco CD, Pons O (2017) Indexing techniques to improve the performance of necessity-based fuzzy queries using classical indexing of RDBMS. Fuzzy Sets Syst. https://doi.org/10.1016/j.fss.2017.09.008. 28 Sep 2017

  93. Menasce DA, Barbará D, Dodge R (2001) Preserving QoS of E-commerce sites through self-tuning: a performance model approach. In: Proceedings of the 3rd ACM conference on electronic commerce, Tampa, Florida, USA, pp 224–234

  94. Menasce DA, Bennani MN (2003) On the use of performance models to design self-managing computer systems. In: Proceedings of computer measurement group conference, December 7–12, Dallas, TX, USA, pp 1–9

  95. Milicevic M, Baranovic M, Zubrinic K (2015) Application of machine learning algorithms for the query performance prediction. Adv Electr Comput Eng 15(3):33–44

    Article  Google Scholar 

  96. Moreno GA, Cámara J, Garlan D, Schmerl B (2015) Proactive self-adaptation under uncertainty: a probabilistic model checking approach. In: ACM proceedings of the 10th joint meeting on foundations of software engineering, pp 1–12

  97. Mozafari B, Curino C, Jindal A, Madden S (2013) Performance and resource modeling in highly-concurrent OLTP workloads. In: Proceedings of the 2013 ACM sigmod international conference on management of data, pp 301–312

  98. Mozafari B, Curino C, Madden S (2013) DBSeer: resource and performance prediction for building a next generation database cloud. In: CIDR

  99. Muller H, Klein M, Wood W, O’Brien W(2006) Autonomic computing (CMU/SEI-2006-TN-006) software engineering institute, Carnegie Mellon University http://resources.sei.cmu.edu/library/asset-view.cfm?AssetID=7855, Accessed 19 May 2018

  100. Müller S, Nica A, Butzmann L, Klauck S, Plattner H (2015) Using object-awareness to optimize join processing in the SAP HANA aggregate cache. In; EDBT, pp 557–568

  101. Narayanan D, Thereska E, Ailamaki A (2005) Continuous resource monitoring for self-predicting DBMS. In: International symposium on modeling, analysis, and simulation of computer and telecommunication systems (MASCOTS), pp 239–248

  102. Narayanan S, Waas F, (2011) Dynamic prioritization of database queries. In: IEEE 27th international conference on data engineering (ICDE), pp 1232–124

  103. Nebot V, Berlanga R, Pérez J, Aramburu M, Pedersen T (2009) Multidimensional integrated ontologies: a framework for designing semantic data warehouses. J Data Semant XIII:1–36

    Google Scholar 

  104. Nicolicin-Georgescu V, Benatier V, Lehn R, Briand H (2009) An ontology-based autonomic system for improving data warehouse performances. Int Conf Knowl Based Intell Inf Eng Syst. Springer, Berlin, pp 261–268

    Google Scholar 

  105. Nikravesh AY, Ajila SA, Lung CH (2017) An autonomic prediction suite for cloud resource provisioning. J Cloud Comput 6(1):3

    Article  Google Scholar 

  106. Nimalasena A, Getov V (2013) System evolution for unknown context through multi-action evaluation. In: IEEE 37th annual computer software and applications conference workshops (COMPSACW), pp 271–276

  107. Nimalasena A, Getov V (2015) Context-aware framework for performance tuning via multi-action evaluation. In: IEEE 39th annual computer software and applications conference (COMPSAC), pp 318–323

  108. Niu B, Martin P, Powley W (2011) Towards autonomic workload management in DBMSs. In: Theoretical and practical advances in information systems development: emerging trends and approaches, IGI Global, pp 154–173

  109. Niu B, Martin P, Powley W, Bird P, Horman R (2007) Poster session: adapting mixed workloads to meet SLOS in autonomic DBMSs. In: IEEE 23rd international conference on data engineering workshop, pp 478–484

  110. Niu B, Martin P, Powley, W, Horman R, Bird P (2006) Workload adaptation in autonomic DBMSs. In: ACM proceedings of the conference of the center for advanced studies on collaborative research (CASCON), USA, pp 161–173

  111. Oh J, Kang KD (2013) A predictive-reactive method for improving the robustness of real-time data services. IEEE Trans Knowl Data Eng 25(5):974–986

    Article  Google Scholar 

  112. Pacifici G, Spreitzer M, Tantawi AN, Youssef A (2005) Performance management for cluster-based web services. IEEE J Sel Areas Commun 23(12):2333–2343

    Article  Google Scholar 

  113. Packer AN (2001) Configuring and tuning databases on the solaris platform. Prentice Hall, Upper saddle River

    Google Scholar 

  114. Panda R, John LK (2014) Data analytics workloads: characterization and similarity analysis. In: IEEE international performance computing and communications conference (IPCCC), pp 1–9

  115. Pavlo A, Angulo G, Arulraj J, Lin H, Lin J, Ma L, Menon P, Mowry TC, Perron M, Quah I, Santurkar S (2017) Self-driving database management systems. In: CIDR 17,Chaminade, California, USA

  116. Peters N, Park S, Chakraborty S, Meurer B, Payer H, Clifford D (2016) Web browser workload characterization for power management on HMP platforms. In:IEEE international conference on hardware/software codesign and system synthesis (CODES + ISSS), pp 1–10

  117. Poggi F, Rossi D, Ciancarini P, Bompani L (2016) An application of semantic technologies to self adaptations. In: IEEE 2nd international forum on research and technologies for society and industry leveraging a better tomorrow (RTSI), pp 1–6

  118. Qian S, Wang S (2010) Research on workload adaptation architecture for DBMS. In: International symposium on intelligence information processing and trusted computing, pp 382–385

  119. Qiang Y, Li Y, Chen J (2009) The workload adaptation in autonomic DBMSs based on layered queuing network model. In: Second IEEE international workshop on knowledge discovery and data mining (WKDD), pp 781–785

  120. Radinsky K, Bennett PN (2013) Predicting content change on the web. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp 415–424

  121. Raza B, Mateen A, Awais MM, Sher M (2011) Survey on autonomic workload management: algorithms, techniques, and models. J Comput 3(7):29–38

    Google Scholar 

  122. Raza B, Mateen A, Hussain T, Awais MM (2009) Autonomic success in databases management systems. In: 8th international conference on computer and information science (ICIS), Shanghai, China, pp 439–444

  123. Raza B, Mateen A, Sher M, Awais MM, Hussain T (2010) Autonomicity in Oracle database management system. In: IEEE international conference on data storage and data engineering (DSDE), pp 296–300

  124. Raza B, Mateen A, Sher M, Awais MM, Hussain (2010) Autonomic view of query optimizers in database management systems. In: IEEE 8th ACIS international conference on software engineering research, management and applications (SERA). pp 3–8

  125. Ren Z, Dong J, Ren Y, Zhou R, You X (2016) Workload characterization on a cloud platform: an early experience. Int J Grid Distrib Comput 9(6):259–268

    Article  Google Scholar 

  126. Rodd SF, Kulkarni UP (2015) Adaptive self-tuning techniques for performance tuning of database systems: a fuzzy-based approach with tuning moderation. Soft Comput 19(7):2039–2045

    Article  Google Scholar 

  127. Rosas C, Sikora A, Jorba J, Moreno A, César E (2014) Improving performance on data-intensive applications using a load balancing methodology based on divisible load theory. Int J Parallel Prog 42(1):94–118

    Article  Google Scholar 

  128. Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: Proceeding of the second international conference on data warehousing and knowledge discovery (DAWAK), pp 224–233

  129. Sarkar J, Saha S, Agrawal S (2014) An efficient use of principal component analysis in workload characterization—a study. AASRI Proced 8:68–74

    Article  Google Scholar 

  130. Schroeder B, Harchol-Balter M, Iyengar A, Nahum E (2006) Achieving class-based QoS for transactional workloads. In: IEEE proceedings of the 22nd international conference on data engineering (ICDE) pp 153–153

  131. Seneviratne S, Levy DC, Buyya R (2013) A taxonomy of performance prediction systems in the parallel and distributed computing grids. arXiv preprint arXiv:1307.2380

  132. Seo B, Kang S, Choi J, Cha J, Won Y, Yoon S (2014) IO workload characterization revisited: a data-mining approach. IEEE Trans Comput 63(12):3026–3038

    Article  MathSciNet  MATH  Google Scholar 

  133. Shetty J, Shobha G (2016) An ensemble of automatic algorithms for forecasting resource utilization in cloud. In: IEEE future technologies conference (FTC), pp 301–306

  134. Silva T, Almeida JM, Guedes D (2011) Live streaming of user generated videos: workload characterization and content delivery architectures. Comput Netw 55(18):4055–4068

    Article  Google Scholar 

  135. Silver D et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484–503

    Article  Google Scholar 

  136. Singhal R, Nambiar M, (2016) Predicting SQL query execution time for large data volume. In: ACM proceedings of the 20th international database engineering and applications symposium, pp 378–385

  137. Stassopoulou A, Dikaiakos MD (2009) Web robot detection: a probabilistic reasoning approach. Comput Netw 53(3):265–278

    Article  MATH  Google Scholar 

  138. Summers J, Brecht, Eager D, Gutarin, A (2016) Characterizing the workload of a Netflix streaming video server. In: IEEE international symposium on workload characterization (IISWC), pp 1–12

  139. Tallent NR, Hoisie A (2014) Palm: easing the burden of analytical performance modeling. In: Proceedings of the 28th ACM international conference on supercomputing, pp 221–230

  140. Tesfatsion SK, Wadbro E, Tordsson J (2016) Autonomic resource management for optimized power and performance in multi-tenant clouds. In: IEEE international conference on autonomic computing (ICAC), pp 85–94

  141. Tetzlaff D, Glesner S (2013) Intelligent prediction of execution times. In: IEEE second international conference on informatics and applications (ICIA), pp 234–239

  142. Thereska E, Narayanan D, Ailamaki A, Ganger GR, (2007) Observer: keeping system models from becoming obsolete. In: Workshop on hot topics in autonomic computing (HotAC), vol 11

  143. Thereska E, Narayanan D, Ganger GR (2006) Towards self-predicting systems: What if you could ask ‘what-if’? Knowl Eng Rev 21(3):261–267

    Article  Google Scholar 

  144. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow 2(2):1626–1629

    Article  Google Scholar 

  145. Transaction Processing Council (TPC). http://www.tpc.org Accessed 14 May 2018

  146. Turcu A, Palmieri R, Ravindran B, Hirve S (2016) Automated data partitioning for highly scalable and strongly consistent transactions. IEEE Trans Parallel Distrib Syst 27(1):106–118

    Article  Google Scholar 

  147. Ueda T, Nakaike T, Ohara M (2016) Workload characterization for microservices. In: IEEE international symposium on workload characterization (IISWC), pp 1–10

  148. Venkataraman S, Yang Z, Franklin MJ, Recht B, Stoica I (2016) Ernest: efficient performance prediction for large-scale advanced analytics. In: NSDI, pp 363–378

  149. Wang W, Zhang M, Chen G, Jagadish HV, Ooi BC, Tan KL (2016) Database meets deep learning: challenges and opportunities. In: ACM SIGMOD record, ACM New York, NY, USA, vol 45, no 2, pp 17–22

  150. Wasserman T, Martin P, Skillicorn DB, Rizvi H (2004) Developing a characterization of business intelligence workloads for sizing new database systems. In: Proceedings of the 7th ACM international workshop on data warehousing and OLAP, pp 7–13

  151. White SR, Hanson JE, Whalley I, Chess DM, Kephart JO (2004) An architectural approach to autonomic computing. In: Proceedings of the IEEE international conference on autonomic computing (ICAC’04), pp 2–9

  152. Wilson C, Sala A, Puttaswamy KP, Zhao BY (2012) Beyond social graphs: user interactions in online social networks and their implications. ACM Trans Web (TWEB) 6(4):17

    Google Scholar 

  153. Wu W, Chi Y, Hacígümüş H, Naughton JF (2013) Towards predicting query execution time for concurrent and dynamic database workloads. Proc VLDB Endow 6(10):925–936

    Article  Google Scholar 

  154. Wu W, Chi Y, Zhu S, Tatemura J, Hacigümüs H, Naughton JF (2013) Predicting query execution time: Are optimizer cost models really unusable? In: IEEE 29th international conference on data engineering (ICDE), pp 1081–1092

  155. Yang J, Qiao Y, Zhang X, He H, Liu F, Cheng G (2015) Characterizing user behavior in mobile internet. IEEE Trans Emerg Top Comput 3(1):95–106

    Article  Google Scholar 

  156. Yusufoglu EE, Ayyildiz M, Gul E (2014) Neural network-based approaches for predicting query response times. In: IEEE international conference on data science and advanced analytics (DSAA), pp 491–497

  157. Zewdu Z, Denko MK, Libsie M (2009) Workload characterization of autonomic DBMSs using statistical and data mining techniques. AINA workshops, pp 244–249

  158. Zhang M, Martin P, Powley W, Chen J (2017) Workload management in database management systems: a taxonomy. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2017.2767044

    Article  Google Scholar 

Download references

Acknowledgements

The study is funded by COMSATS University Islamabad (CUI), Islamabad, Pakistan, under CIIT/ORIC-PD/17. We appreciate the suggestions and comments of esteemed reviewers that helped in improving the quality of paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Basit Raza.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raza, B., Sher, A., Afzal, S. et al. Autonomic workload performance tuning in large-scale data repositories. Knowl Inf Syst 61, 27–63 (2019). https://doi.org/10.1007/s10115-018-1272-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1272-0

Keywords

Navigation