Abstract
The workload in large-scale data repositories involves concurrent users and contains homogenous and heterogeneous data. The large volume of data, dynamic behavior and versatility of large-scale data repositories is not easy to be managed by humans. This requires computational power for managing the load of current servers. Autonomic technology can support predicting the workload type; decision support system or online transaction processing can help servers to autonomously adapt to the workloads. The intelligent system could be designed by knowing the type of workload in advance and predict the performance of workload that could autonomically adapt the changing behavior of workload. Workload management involves effectively monitoring and controlling the workflow of queries in large-scale data repositories. This work presents a taxonomy through systematic analysis of workload management in large-scale data repositories with respect to autonomic computing (AC) including database management systems and data warehouses. The state-of-the-art practices in large-scale data repositories are reviewed with respect to AC for characterization, performance prediction and adaptation of workload. Current issues are highlighted at the end with future directions.







Similar content being viewed by others
Abbreviations
- DBMS:
-
Database management system
- ADBMS:
-
Autonomic database management system
- AWPT:
-
Autonomic workload performance tuning
- OLAP:
-
Online analytical processing
- OLTP:
-
Online transaction processing
- KCCA:
-
Kernel canonical correlation analysis
- TPC:
-
Transaction Processing Council
- DBA:
-
Database administrator
- SVM:
-
Support vector machines
- QEP:
-
Query execution plan
- AC:
-
Autonomic computing
- QoS:
-
Quality of service
- KNN:
-
K-nearest neighbor
- OSN:
-
Online social network
- CBMG:
-
Customer behavior model graph
- GC:
-
Garbage collection
- CRT:
-
Classification and regression tree
- BI:
-
Business intelligence
- PCA:
-
Principal component analysis
- CCA:
-
Canonical correlation analysis
- QP:
-
Query patroller
- PQR:
-
Predictions of query runtime
- SLA:
-
Service level agreement
- EQMS:
-
External queue management system
- WCF:
-
Workload classification and forecasting
- MAPEK:
-
Monitor, Analyze, Plan, Execute, Knowledge
- DML:
-
Descartes Modeling Language
- ANN:
-
Artificial neural network
References
Abadi M et al (2016). TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467
Abdul M, Muhammad AM, Mustapha N, Muhammad S, Ahmad N (2014) Database workload management through CBR and fuzzy based characterization. Appl Soft Comput 22:605–621
Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow 2(1):922–933
Agrawal S, Chaudhuri S, Kollar L, Marathe A, Narasayya, V, Syamala M (2005) Database tuning advisor for microsoft SQL server, In: The proceeding of the 30th VLDB conference, pp 1110–1121
Akdere M, Cetintemel U, Riondato M, Upfal E, Zdonik SB (2012) Learning-based query performance modeling and prediction. In: IEEE 28th international conference on data engineering (ICDE), pp 390–401
Alvarez GP, Chau WJ (2016) Scenario-aware workload characterization based on a max-plus linear representation. In: International conference on formal modeling and analysis of timed systems, Springer International Publishing, Berlin, pp 177–194
Aly AM, Mahmood AR, Hassan MS, Aref WG, Ouzzani M, Elmeleegy H, Qadah T (2015) Aqwa: adaptive query workload aware partitioning of big spatial data. Proc VLDB Endow 8(13):2062–2073
Aouiche K, Darmont J (2017) Index and materialized view selection in data warehouses. arXiv preprint arXiv:1701.08029
Awad M, Menasc DA (2015) Automatic workload characterization using system log analysis. In: Computer measurement group conference on performance and capacity, San Antonio, TX, USA
Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3(Jul):1–48
Ballinger C (2002) Introduction to teradata’s priority scheduler, http://www.teradatalibrary.com/pdf/eb3092.pdf. Accessed 16 May 2018
Benevenuto F, Rodrigues T, Cha M, Almeida V (2012) Characterizing user navigation and interactions in online social networks. Inf Sci 195:1–24
Bernardini C, Silverston T, Festor O (2014) A pin is worth a thousand words: characterization of publications in pinterest. In: IEEE international conference on wireless communications and mobile computing (IWCMC), pp 322–327
Bernstein PA, Das S, Ding B, Pilman M (2015) Optimizing optimistic concurrency control for tree-structured, log-structured databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1295–1309
Bhattacharyya A, Hoefler T (2014) Pemogen: automatic adaptive performance modeling during program runtime. In: 23rd international conference on parallel architecture and compilation techniques (PACT), pp 393–404
Bruno N, Chaudhuri S (2007) An online approach to physical design tuning. In: IEEE 23rd international conference on data engineering (ICDE), pp 826–835
Calzarossa MC, Massari L (2011) Analysis of web logs: challenges and findings. In: Performance evaluation of computer and communication systems. Milestones and future challenges, Springer, Berlin, pp 227–239
Calzarossa MC, Massari L, Tessera D (2016) Workload characterization: a survey revisited. ACM Comput Surv (CSUR) 48(3):48
Calzarossa MC, Tessera D (2014) Multivariate analysis of web content changes. In: IEEE/ACS 11th international conference on computer systems and applications (AICCSA), pp 699–706
Calzarossa MC, Tessera D (2015) Modeling and predicting temporal patterns of web content changes. J Netw Comput Appl 56:115–123
Carbunar B, Potharaju R (2015) A longitudinal study of the Google app market. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 242–249
Cha M, Benevenuto F, Ahn YY, Gummadi KP (2012) Delayed information cascades in Flickr: measurement, analysis, and modeling. Comput Netw 56(3):1066–1076
Chandramouli B, Bond CN, Babu S, Yang J (2007) Query suspend and resume. In: ACM proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 557–568
Chang X, Terpenny J (2009) Ontology-based data integration and decision support for product e-design. Robot Comput Integr Manuf 25(6):863–870
Chaudhuri S, Kaushik R, Pol A, Ramamurthy R (2007) Stop-and-restart style execution for long running decision support queries. In: Proceedings of the 33rd international conference on very large data bases, VLDB endowment, pp 735–745
Chaudhuri S, Weikum G (2000) Rethinking database system architecture: towards a self-tuning RISC-style database system. In: VLDB, pp 1–10
Chen H, Chiang RH, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36(4):1165–1188
Cheng X, Liu J, Dale C (2013) Understanding the characteristics of internet short video sharing: a YouTube-based measurement study. IEEE Trans Multimed 15(5):1184–1194
Chetsa T, Landry G, Lefevrem L, Stolf P (2014) A three step blind approach for improving high performance computing systems’ energy performance. Concurr Comput Pract Exp 26(15):2612–2629
Chi C, Zhou Y, Ye X (2013) Performance prediction for performance-sensitive queries based on algorithmic complexity. Tsinghua Sci Technol 18(6):618–628
Chiba T, Onodera T (2016) Workload characterization and optimization of TPC-H queries on Apache Spark. In: IEEE international symposium on performance analysis of systems and software (ISPASS), pp 112–121
Coker Z, Garlan D, Le Goues C (2015) SASS: self-adaptation using stochastic search. In: IEEE/ACM 10th international symposium on software engineering for adaptive and self-managing systems (SEAMS), pp 168–174
Cyran M, Green CD (2001) Oracle 9i database performance guide and reference. Release 1(9.0): 1
DB2 Query Patroller Guide: Installation, Administration and Usage (2003) IBM Corporation
de Carvalho Costa RL, Furtado P (2015) Elections and reputation for high dependability and performance in distributed workload execution. IEEE Trans Parallel Distrib Syst 26(8):2233–2246
Derakhshan R, Stantic B, Korn O, Dehne F (2008) Parallel simulated annealing for materialized view selection in data warehousing environments. Lect Notes Comput Sci 5022:121–132
Diao Y, Hellerstein JL, Parekh S, Griffith R, Kaiser G, Phung D (2005) Self-managing systems: a control theory foundation. In: Proceedings of the 12th IEEE international conference and workshop on the engineering of computer-based systems, pp 441–448
Didona D, Quaglia F, Romano P, Torre E (2015) Enhancing performance prediction robustness by combining analytical modeling and machine learning. In: Proceedings of the 6th ACM/SPEC international conference on performance engineering, pp 45–156
Ding Z, Wei Z, Chen H (2017) A software cybernetics approach to self-tuning performance of on-line transaction processing systems. J Syst Softw 124:247–259
Do TMT, Gatica-Perez D (2014) Where and what: using smartphones to predict next locations and applications in daily life. Pervasive Mob Comput 12:79–91
Dona J, Ortega A, Holgado M (2016) Business intelligence strategy for data warehouse in andalusian health service. InImpact J Innov Impact 6(1):121
Duggan J, Chi Y, Hacigumus H, Zhu S, Cetintemel U (2013) Packing light: portable workload performance prediction for the cloud. In: IEEE 29th international conference on data engineering workshops (ICDEW), pp 258–265
Duggan J, Papaemmanouil O, Cetintemel U, Upfal E (2014) Contender: a resource modeling approach for concurrent query performance prediction. In: EDBT, pp 109–120
Elnaffar S (2002) A methodology for auto-recognizing DBMS workloads. In: Proceedings of the conference of the centre for advanced studies on collaborative research, IBM Press, p 2
Elnaffar S, Martin P (2004) An intelligent framework for predicting shifts in the workloads of autonomic database management systems. In: Proceedings of IEEE international conference on advances in intelligent systems–theory and applications
Elnaffar S, Martin P (2009) The psychic-skeptic prediction framework for effective monitoring of DBMS workloads. Data Knowl Eng 68(4):393–414
Elnaffar S, Martin P, Horman R (2002) Automatically classifying database workloads. In: Proceeding of the ACM conference on Information and Knowledge management, pp 622–624
Elnaffar S, Martin P, Schiefer B, Lightstone S (2008) Is it DSS or OLTP: automatically identifying DBMS workloads. J Intell Inf Syst 30(3):249–271
Elnaffar S, Powley W, Benoit D, Martin P (2003) Today’s DBMSs: How autonomic are they? In: Proceedings of the 14th international workshop on database and expert systems applications, IEEE Computer Society, pp 651–655
Elnikety S, Nahum E, Tracey J, Zwaenepoel W (2004) A method for transparent admission control and request scheduling in e-commerce web sites. In: ACM proceedings of the 13th international conference on World Wide Web, pp 276–286
Fenacci D, Franke B, Thomson J (2010) Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining. In: Proceedings of the 13th ACM international workshop on software and compilers for embedded systems, p 5
Figueiredo F, Almeida JM, Gonçalves MA, Benevenuto F (2014) On the dynamics of social media popularity: a YouTube case study. ACM Trans Internet Technol (TOIT) 14(4):24
Florio L (2017) Design and management of distributed self-adaptive systems. Dissertation, Politecnico di Milano
Ganapathi A, Kuno H, Dayal U, Wiener JL, Fox A, Jordan M, Patterson D (2009) Predicting multiple metrics for queries: better decisions enabled by machine learning. In: IEEE 25th international conference on data engineering (ICDE), pp 592–603
Gates AF, Natkovich O, Chopra S, Kamath P, Narayanamurthy SM, Olston C, Reed B, Srinivasan S, Srivastava U (2009) Building a high-level dataflow system on top of Map-Reduce: the Pig experience. Proc VLDB Endow 2(2):1414–1425
George J, Kumar V, Kumar S (2015) Data warehouse design considerations for a healthcare business intelligence system. In: World congress on engineering
Gour V, Sarangdevot SS, Tanwar GS (2010) Performance tuning mechanisms for data warehouse: query cache. Int J Comput Appl 2(2):70–75
Grund M, Krüger J, Plattner H, Zeier A, Cudre-Mauroux P, Madden S (2010) HYRISE: a main memory hybrid storage engine. Proc VLDB Endow 4(2):105–116
Gupta C, Mehta A, Dayal U (2008) PQR: predicting query execution times for autonomous workload management. In: International conference on autonomic computing (ICAC), pp 13–22
Harbi R, Abdelaziz I, Kalnis P, Mamoulis N, Ebrahim Y, Sahli M (2016) Accelerating SPARQL queries by exploiting hash-based locality and adaptive partitioning. VLDB J 25(3):355–380
Hasan R (2014) Predicting SPARQL query performance and explaining linked data. In: European semantic web conference, Springer, Cham, pp 795–805
Heinrich R, Jung R, Schmieders E, Metzger A, Hasselbring W, Reussner R, Pohl K (2015) Architectural run-time models for operator-in-the-loop adaptation of cloud applications. In: IEEE 9th international symposium on the maintenance and evolution of service-oriented and cloud-based environments (MESOCA), pp 36–40
Herbst NR, Huber N, Kounev S, Amrehn E (2014) Self-adaptive workload classification and forecasting for proactive resource provisioning. Concurr Comput Pract Exp Wiley 26(12):2053–2078
Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. CIDR 11(2011):261–272
Holze M, Ritter N (2008) Autonomic databases: detection of workload shifts with n-Gram-models. In: ADBIS, vol 8, pp 127–142
Horzyk A, Dudek-Dyduch E (2005) Effectiveness of artificial neural networks adaptation according to time period of training data acquisition. In: Intelligent systems design and applications (ISDA), pp130–135
Hsu WW, Smith AJ, Young HC (2001) Characteristics of production database workloads and the TPC benchmarks. IBM Syst J 40(3):781–802
Huber N, Walter J, Bähr M, Kounev S (2015) Model-based autonomic and performance-aware system adaptation in heterogeneous resource environments: a case study. In: IEEE 2015 international conference on cloud and autonomic computing (ICCAC), pp 181–191
Hurault A, Baek K, Casanova H (2015) Selecting linear algebra kernel composition using response time prediction. Softw Pract Exp 45(12):1659–1676
IBM (2000) DB2 universal database version 7 administration guide: performance. IBM Corporation, New York
Jia Z, Zhan J, Wang L, Han R, McKee SA, Yang Q, Luo C, Li J (2014) Characterizing and subsetting big data workloads. In: IEEE international symposium on workload characterization (IISWC), pp. 191–201
Keeton K, Patterson DA (2000) Towards a simplified database workload for computer architecture evaluations. In: Workload characterization for computer system design, Springer, USA, pp 49–71
Kemper A, Neumann T (2011) HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: IEEE 27th international conference on data engineering (ICDE), pp 195–206
Khanna R, Ganguli M, Narayan A, Abhiram R, Gupta P (2014) Autonomic characterization of workloads using workload fingerprinting. In: 2014 IEEE international conference on cloud computing in emerging markets (CCEM), pp 1–8
Khattab A, Algergawy A, Sarhan A (2015) MAG: a performance evaluation framework for database systems. Knowl Based Syst 85:245–255
Khoshkbarforoushha A, Ranjan R (2016) Resource and performance distribution prediction for large-scale analytics queries. In: Proceedings of the 7th ACM/SPEC on international conference on performance engineering, pp 49–54
Koehler J, Giblin C, Gantenbein D, Hauser R (2003) On autonomic computing architectures. Research report, IBM Zurich Research Laboratory, Switzerland
Lee S, Meredith JS, Vetter JS, (2015) Compass: a framework for automated performance modeling and prediction. In: Proceedings of the 29th ACM on international conference on supercomputing, pp 405–414
Liao ZX, Pan YC, Peng WC, Lei PR (2013) On mining mobile apps usage behavior for predicting apps usage in smartphones. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 609–618
Lightstone SS, Lohman G, Zilio D (2002) Toward autonomic computing with DB2 universal database. SIGMOD Rec 31(3):55–61
Liu C, Liu C, Shang Y, Chen S, Cheng B, Chen J (2017) An adaptive prediction approach based on workload pattern discrimination in the cloud. J Netw Comput Appl 80:35–44
Lu Y, Shanbhag A, Jindal A, Madden S (2017) AdaptDB: adaptive partitioning for distributed joins. Proc VLDB Endow 10(5):589–600
Maghawry EA, Ismail RM, Badr NL, Tolba MF (2014) An enhanced queries scheduler for query processing over a cloud environment. In: IEEE 9th international conference on computer engineering and systems (ICCES), pp 409–414
Mahanti A, Carlsson N, Mahanti A, Arlitt M, Williamson C (2013) A tale of the tails: power-laws in internet measurements. IEEE Netw 27(1):59–64
Marcus R, Papaemmanouil O (2016) WiSeDB: a learning-based workload management advisor for cloud databases. Proc VLDB Endow 9(10):780–791
Marcus R, Papaemmanouil O (2016) Workload management for cloud databases via machine learning. In: IEEE 32nd international conference on data engineering workshops (ICDEW), pp 27–30
Huebscher MC, McCann JA (2008) A survey of autonomic computing—degrees, models, and applications. ACM Comput Surv 40(3):1–28
Martin P, Elnaffar S, Wasserman T (2006) Workload models for autonomic database management systems. In: IEEE international conference on autonomic and autonomous systems (ICAS), p 10
Mateen A, Raza B, Hussain T, Awais MM (2008) Autonomic computing in SQL server. In: IEEE/ACIS 7th international conference on computer and information science (ICIS), pp 113–118
Mateen A, Raza B, Hussain T, Awais MM (2009) Autonomicity in universal database DB2. In: IEEE/ACIS international conference on computer and information science (ICIS), pp 445–450
Mateen A, Raza B, Sher M et al (2014) Workload management: a technology perspective with respect to self-characteristics. Artif Intell Rev 41(4):463–489
Medina JM, Barranco CD, Pons O (2017) Indexing techniques to improve the performance of necessity-based fuzzy queries using classical indexing of RDBMS. Fuzzy Sets Syst. https://doi.org/10.1016/j.fss.2017.09.008. 28 Sep 2017
Menasce DA, Barbará D, Dodge R (2001) Preserving QoS of E-commerce sites through self-tuning: a performance model approach. In: Proceedings of the 3rd ACM conference on electronic commerce, Tampa, Florida, USA, pp 224–234
Menasce DA, Bennani MN (2003) On the use of performance models to design self-managing computer systems. In: Proceedings of computer measurement group conference, December 7–12, Dallas, TX, USA, pp 1–9
Milicevic M, Baranovic M, Zubrinic K (2015) Application of machine learning algorithms for the query performance prediction. Adv Electr Comput Eng 15(3):33–44
Moreno GA, Cámara J, Garlan D, Schmerl B (2015) Proactive self-adaptation under uncertainty: a probabilistic model checking approach. In: ACM proceedings of the 10th joint meeting on foundations of software engineering, pp 1–12
Mozafari B, Curino C, Jindal A, Madden S (2013) Performance and resource modeling in highly-concurrent OLTP workloads. In: Proceedings of the 2013 ACM sigmod international conference on management of data, pp 301–312
Mozafari B, Curino C, Madden S (2013) DBSeer: resource and performance prediction for building a next generation database cloud. In: CIDR
Muller H, Klein M, Wood W, O’Brien W(2006) Autonomic computing (CMU/SEI-2006-TN-006) software engineering institute, Carnegie Mellon University http://resources.sei.cmu.edu/library/asset-view.cfm?AssetID=7855, Accessed 19 May 2018
Müller S, Nica A, Butzmann L, Klauck S, Plattner H (2015) Using object-awareness to optimize join processing in the SAP HANA aggregate cache. In; EDBT, pp 557–568
Narayanan D, Thereska E, Ailamaki A (2005) Continuous resource monitoring for self-predicting DBMS. In: International symposium on modeling, analysis, and simulation of computer and telecommunication systems (MASCOTS), pp 239–248
Narayanan S, Waas F, (2011) Dynamic prioritization of database queries. In: IEEE 27th international conference on data engineering (ICDE), pp 1232–124
Nebot V, Berlanga R, Pérez J, Aramburu M, Pedersen T (2009) Multidimensional integrated ontologies: a framework for designing semantic data warehouses. J Data Semant XIII:1–36
Nicolicin-Georgescu V, Benatier V, Lehn R, Briand H (2009) An ontology-based autonomic system for improving data warehouse performances. Int Conf Knowl Based Intell Inf Eng Syst. Springer, Berlin, pp 261–268
Nikravesh AY, Ajila SA, Lung CH (2017) An autonomic prediction suite for cloud resource provisioning. J Cloud Comput 6(1):3
Nimalasena A, Getov V (2013) System evolution for unknown context through multi-action evaluation. In: IEEE 37th annual computer software and applications conference workshops (COMPSACW), pp 271–276
Nimalasena A, Getov V (2015) Context-aware framework for performance tuning via multi-action evaluation. In: IEEE 39th annual computer software and applications conference (COMPSAC), pp 318–323
Niu B, Martin P, Powley W (2011) Towards autonomic workload management in DBMSs. In: Theoretical and practical advances in information systems development: emerging trends and approaches, IGI Global, pp 154–173
Niu B, Martin P, Powley W, Bird P, Horman R (2007) Poster session: adapting mixed workloads to meet SLOS in autonomic DBMSs. In: IEEE 23rd international conference on data engineering workshop, pp 478–484
Niu B, Martin P, Powley, W, Horman R, Bird P (2006) Workload adaptation in autonomic DBMSs. In: ACM proceedings of the conference of the center for advanced studies on collaborative research (CASCON), USA, pp 161–173
Oh J, Kang KD (2013) A predictive-reactive method for improving the robustness of real-time data services. IEEE Trans Knowl Data Eng 25(5):974–986
Pacifici G, Spreitzer M, Tantawi AN, Youssef A (2005) Performance management for cluster-based web services. IEEE J Sel Areas Commun 23(12):2333–2343
Packer AN (2001) Configuring and tuning databases on the solaris platform. Prentice Hall, Upper saddle River
Panda R, John LK (2014) Data analytics workloads: characterization and similarity analysis. In: IEEE international performance computing and communications conference (IPCCC), pp 1–9
Pavlo A, Angulo G, Arulraj J, Lin H, Lin J, Ma L, Menon P, Mowry TC, Perron M, Quah I, Santurkar S (2017) Self-driving database management systems. In: CIDR 17,Chaminade, California, USA
Peters N, Park S, Chakraborty S, Meurer B, Payer H, Clifford D (2016) Web browser workload characterization for power management on HMP platforms. In:IEEE international conference on hardware/software codesign and system synthesis (CODES + ISSS), pp 1–10
Poggi F, Rossi D, Ciancarini P, Bompani L (2016) An application of semantic technologies to self adaptations. In: IEEE 2nd international forum on research and technologies for society and industry leveraging a better tomorrow (RTSI), pp 1–6
Qian S, Wang S (2010) Research on workload adaptation architecture for DBMS. In: International symposium on intelligence information processing and trusted computing, pp 382–385
Qiang Y, Li Y, Chen J (2009) The workload adaptation in autonomic DBMSs based on layered queuing network model. In: Second IEEE international workshop on knowledge discovery and data mining (WKDD), pp 781–785
Radinsky K, Bennett PN (2013) Predicting content change on the web. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp 415–424
Raza B, Mateen A, Awais MM, Sher M (2011) Survey on autonomic workload management: algorithms, techniques, and models. J Comput 3(7):29–38
Raza B, Mateen A, Hussain T, Awais MM (2009) Autonomic success in databases management systems. In: 8th international conference on computer and information science (ICIS), Shanghai, China, pp 439–444
Raza B, Mateen A, Sher M, Awais MM, Hussain T (2010) Autonomicity in Oracle database management system. In: IEEE international conference on data storage and data engineering (DSDE), pp 296–300
Raza B, Mateen A, Sher M, Awais MM, Hussain (2010) Autonomic view of query optimizers in database management systems. In: IEEE 8th ACIS international conference on software engineering research, management and applications (SERA). pp 3–8
Ren Z, Dong J, Ren Y, Zhou R, You X (2016) Workload characterization on a cloud platform: an early experience. Int J Grid Distrib Comput 9(6):259–268
Rodd SF, Kulkarni UP (2015) Adaptive self-tuning techniques for performance tuning of database systems: a fuzzy-based approach with tuning moderation. Soft Comput 19(7):2039–2045
Rosas C, Sikora A, Jorba J, Moreno A, César E (2014) Improving performance on data-intensive applications using a load balancing methodology based on divisible load theory. Int J Parallel Prog 42(1):94–118
Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: Proceeding of the second international conference on data warehousing and knowledge discovery (DAWAK), pp 224–233
Sarkar J, Saha S, Agrawal S (2014) An efficient use of principal component analysis in workload characterization—a study. AASRI Proced 8:68–74
Schroeder B, Harchol-Balter M, Iyengar A, Nahum E (2006) Achieving class-based QoS for transactional workloads. In: IEEE proceedings of the 22nd international conference on data engineering (ICDE) pp 153–153
Seneviratne S, Levy DC, Buyya R (2013) A taxonomy of performance prediction systems in the parallel and distributed computing grids. arXiv preprint arXiv:1307.2380
Seo B, Kang S, Choi J, Cha J, Won Y, Yoon S (2014) IO workload characterization revisited: a data-mining approach. IEEE Trans Comput 63(12):3026–3038
Shetty J, Shobha G (2016) An ensemble of automatic algorithms for forecasting resource utilization in cloud. In: IEEE future technologies conference (FTC), pp 301–306
Silva T, Almeida JM, Guedes D (2011) Live streaming of user generated videos: workload characterization and content delivery architectures. Comput Netw 55(18):4055–4068
Silver D et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484–503
Singhal R, Nambiar M, (2016) Predicting SQL query execution time for large data volume. In: ACM proceedings of the 20th international database engineering and applications symposium, pp 378–385
Stassopoulou A, Dikaiakos MD (2009) Web robot detection: a probabilistic reasoning approach. Comput Netw 53(3):265–278
Summers J, Brecht, Eager D, Gutarin, A (2016) Characterizing the workload of a Netflix streaming video server. In: IEEE international symposium on workload characterization (IISWC), pp 1–12
Tallent NR, Hoisie A (2014) Palm: easing the burden of analytical performance modeling. In: Proceedings of the 28th ACM international conference on supercomputing, pp 221–230
Tesfatsion SK, Wadbro E, Tordsson J (2016) Autonomic resource management for optimized power and performance in multi-tenant clouds. In: IEEE international conference on autonomic computing (ICAC), pp 85–94
Tetzlaff D, Glesner S (2013) Intelligent prediction of execution times. In: IEEE second international conference on informatics and applications (ICIA), pp 234–239
Thereska E, Narayanan D, Ailamaki A, Ganger GR, (2007) Observer: keeping system models from becoming obsolete. In: Workshop on hot topics in autonomic computing (HotAC), vol 11
Thereska E, Narayanan D, Ganger GR (2006) Towards self-predicting systems: What if you could ask ‘what-if’? Knowl Eng Rev 21(3):261–267
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow 2(2):1626–1629
Transaction Processing Council (TPC). http://www.tpc.org Accessed 14 May 2018
Turcu A, Palmieri R, Ravindran B, Hirve S (2016) Automated data partitioning for highly scalable and strongly consistent transactions. IEEE Trans Parallel Distrib Syst 27(1):106–118
Ueda T, Nakaike T, Ohara M (2016) Workload characterization for microservices. In: IEEE international symposium on workload characterization (IISWC), pp 1–10
Venkataraman S, Yang Z, Franklin MJ, Recht B, Stoica I (2016) Ernest: efficient performance prediction for large-scale advanced analytics. In: NSDI, pp 363–378
Wang W, Zhang M, Chen G, Jagadish HV, Ooi BC, Tan KL (2016) Database meets deep learning: challenges and opportunities. In: ACM SIGMOD record, ACM New York, NY, USA, vol 45, no 2, pp 17–22
Wasserman T, Martin P, Skillicorn DB, Rizvi H (2004) Developing a characterization of business intelligence workloads for sizing new database systems. In: Proceedings of the 7th ACM international workshop on data warehousing and OLAP, pp 7–13
White SR, Hanson JE, Whalley I, Chess DM, Kephart JO (2004) An architectural approach to autonomic computing. In: Proceedings of the IEEE international conference on autonomic computing (ICAC’04), pp 2–9
Wilson C, Sala A, Puttaswamy KP, Zhao BY (2012) Beyond social graphs: user interactions in online social networks and their implications. ACM Trans Web (TWEB) 6(4):17
Wu W, Chi Y, Hacígümüş H, Naughton JF (2013) Towards predicting query execution time for concurrent and dynamic database workloads. Proc VLDB Endow 6(10):925–936
Wu W, Chi Y, Zhu S, Tatemura J, Hacigümüs H, Naughton JF (2013) Predicting query execution time: Are optimizer cost models really unusable? In: IEEE 29th international conference on data engineering (ICDE), pp 1081–1092
Yang J, Qiao Y, Zhang X, He H, Liu F, Cheng G (2015) Characterizing user behavior in mobile internet. IEEE Trans Emerg Top Comput 3(1):95–106
Yusufoglu EE, Ayyildiz M, Gul E (2014) Neural network-based approaches for predicting query response times. In: IEEE international conference on data science and advanced analytics (DSAA), pp 491–497
Zewdu Z, Denko MK, Libsie M (2009) Workload characterization of autonomic DBMSs using statistical and data mining techniques. AINA workshops, pp 244–249
Zhang M, Martin P, Powley W, Chen J (2017) Workload management in database management systems: a taxonomy. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2017.2767044
Acknowledgements
The study is funded by COMSATS University Islamabad (CUI), Islamabad, Pakistan, under CIIT/ORIC-PD/17. We appreciate the suggestions and comments of esteemed reviewers that helped in improving the quality of paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raza, B., Sher, A., Afzal, S. et al. Autonomic workload performance tuning in large-scale data repositories. Knowl Inf Syst 61, 27–63 (2019). https://doi.org/10.1007/s10115-018-1272-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1272-0