Abstract
Current applications require the processing of huge amounts of data produced by applications or end users personal devices. In such settings, intelligent analytics on top of large scale data are the key research subject for future data driven decision making. Due to the huge amount of data, analytics should be based on an efficient technique for querying big data partitions. Each partition contains only a part of the data and a processor is dedicated to execute queries for the corresponding partition. A Query Controller (QC) is responsible for managing continuous queries and returning the final outcome to users / applications by using the underlying processors. In this paper, we propose a learning scheme to be adopted by the QC for allocating each query to the available processors. We adopt the Q-learning algorithm to calculate the reward that the QC obtains for every allocation between queries and processors. The outcome is an efficient model that derives the optimal allocation for the incoming queries. We provide mathematical formulations for solving the discussed problem and present our simulation results. Through a large number of simulations, we reveal the advantages of the proposed model and give numerical results while comparing our framework with a baseline model.
Similar content being viewed by others
References
Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik SB (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2)
Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1)
Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. ACM SIGMOD, USA
Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2004) STREAM: the stanford data stream management system. Springer
Boulougaris G, Kolomvatsos K, Hadjiefthymiades S (2010) Building the knowledge base of a buyer agent using reinforcement learning techniques. In: Proceedings of the 2010 IEEE World Congress on Computational Intelligence (WCCI 2010), IJCNN, Barcelona, Spain, pp 1166–1173
Burges C (2009) Dimension reduction: a guided tour. Found Trends Mach Learn 2(4):275–365
Chandramouli B, Goldstein J, Quamar A (2013) Scalable progressive analytics on big data in the cloud, vol 6
Chandrasekaran S, Franklin MJ (2003) PSoup: a system for streaming queries over streaming data. VLDB J 12(2):140–156
Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: SIGMOD
S. Chaudhuri S, Das S, Datar G, Motwani M, Narasayya VR (2001) Overcoming limitations of sampling for aggregation queries. In: ICDE, pp 534–542
Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: Proceedings of the 7th Conference on Networked Systems Design and Implementation
Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD
Dean J, Ghemawat S MapReduce: Simplified Data Processing on Large Clusters, Google Research, available at http://research.google.com/archive/mapreduce.html
Dittrich J, Quiane-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah. PVLDB 3(1)
Doucet A, Briers M, Senecal S (2006) Efficient block sampling strategies for sequential Monte Carlo methods, Journal of Computational and Graphical Statistics
Gualtieri M, Yuhanna N (2014) The forrester wave: big data hadoop solutions, Technical Report
Hammad M, Ghanem TM, Aref W, Elmagarmid AK, Mokbel M (2003) Efficient pipelined execution of sliding-window queries over data streams, Technical Report TR CSD-03-035, Purdue University Department of Computer Sciences
Hellerstein JM, Avnur R (2000) Informix under control: Online query Processing, Data Mining and Knowledge Discovery Journal
Herodotou H, Lim H, Luo G, Borisov N, DOng L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: CIDR
Jermaine C, Arumugam S, Pol A, Dobra A (2007) Scalable approximate query processing with the DBO engine. In: SIGMOD
Jiang D, Ooi DC, Shi L, Wu S (2010) The performance of mapreduce: an in-depth study. PVLDB 3(1)
Logothetis D, Yocum K (2008) Ad-hoc data processing in the cloud. Proc VLDB Endowment 1(2):1472–1475
Matias Y, Urieli D (2005) Optimal workload-based weighted wavelet synopses, chapter in database theory - ICDT 2005, vol 3363, Lecture Notes in Computer Science. Springer, pp 368–382
Manju S, Punithavalli M (2011) An analysis of q-learning algorithms with strategies of reward function. Int J Comput Sci Eng (IJCSE) 3(2)
Mokbel M, Xiong X, Hammad M, Aref W (2005) Continuous query processing of spatio-temporal data streams in PLACE. Geoinformatics 9(4)
Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku GS, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the International Conference on Innovative Data Systems Research, CIDR
Pandey P, Pandey D, Kumar S (2010) Reinforcement learning by comparing immediate reward. Int J Comput Sci Inf Secur 8(5)
Pansare N, Borkar VR, Jermaine C, Condie T (2011) Online aggregation for large MapReduce jobs. In: PVLDB
Raman V, Raman B, Hellerstein JM (1999) Online dynamic reordering for interactive data processing. In: VLDB
Singh S, Singh N (2012) Big data analytics. In: Proceedings of the International Conference on Communication. Information and Computing Technology
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press
Yao Y, Gehrke J (2002) The cougar approach to in-network query processing in sensor networks. SIGMOD Rec 31(3)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kolomvatsos, K., Hadjiefthymiades, S. Learning the engagement of query processors for intelligent analytics. Appl Intell 46, 96–112 (2017). https://doi.org/10.1007/s10489-016-0821-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-016-0821-z