Learning the engagement of query processors for intelligent analytics

Kolomvatsos, Kostas; Hadjiefthymiades, Stathes

doi:10.1007/s10489-016-0821-z

Learning the engagement of query processors for intelligent analytics

Published: 26 July 2016

Volume 46, pages 96–112, (2017)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Kostas Kolomvatsos¹ &
Stathes Hadjiefthymiades²

267 Accesses
14 Citations
Explore all metrics

Abstract

Current applications require the processing of huge amounts of data produced by applications or end users personal devices. In such settings, intelligent analytics on top of large scale data are the key research subject for future data driven decision making. Due to the huge amount of data, analytics should be based on an efficient technique for querying big data partitions. Each partition contains only a part of the data and a processor is dedicated to execute queries for the corresponding partition. A Query Controller (QC) is responsible for managing continuous queries and returning the final outcome to users / applications by using the underlying processors. In this paper, we propose a learning scheme to be adopted by the QC for allocating each query to the available processors. We adopt the Q-learning algorithm to calculate the reward that the QC obtains for every allocation between queries and processors. The outcome is an efficient model that derives the optimal allocation for the incoming queries. We provide mathematical formulations for solving the discussed problem and present our simulation results. Through a large number of simulations, we reveal the advantages of the proposed model and give numerical results while comparing our framework with a baseline model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik SB (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2)
Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1)
Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. ACM SIGMOD, USA
Google Scholar
Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2004) STREAM: the stanford data stream management system. Springer
Boulougaris G, Kolomvatsos K, Hadjiefthymiades S (2010) Building the knowledge base of a buyer agent using reinforcement learning techniques. In: Proceedings of the 2010 IEEE World Congress on Computational Intelligence (WCCI 2010), IJCNN, Barcelona, Spain, pp 1166–1173
Burges C (2009) Dimension reduction: a guided tour. Found Trends Mach Learn 2(4):275–365
Article MATH Google Scholar
Chandramouli B, Goldstein J, Quamar A (2013) Scalable progressive analytics on big data in the cloud, vol 6
Chandrasekaran S, Franklin MJ (2003) PSoup: a system for streaming queries over streaming data. VLDB J 12(2):140–156
Article Google Scholar
Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: SIGMOD
S. Chaudhuri S, Das S, Datar G, Motwani M, Narasayya VR (2001) Overcoming limitations of sampling for aggregation queries. In: ICDE, pp 534–542
Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: Proceedings of the 7th Conference on Networked Systems Design and Implementation
Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD
Dean J, Ghemawat S MapReduce: Simplified Data Processing on Large Clusters, Google Research, available at http://research.google.com/archive/mapreduce.html
Dittrich J, Quiane-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah. PVLDB 3(1)
Doucet A, Briers M, Senecal S (2006) Efficient block sampling strategies for sequential Monte Carlo methods, Journal of Computational and Graphical Statistics
Gualtieri M, Yuhanna N (2014) The forrester wave: big data hadoop solutions, Technical Report
Hammad M, Ghanem TM, Aref W, Elmagarmid AK, Mokbel M (2003) Efficient pipelined execution of sliding-window queries over data streams, Technical Report TR CSD-03-035, Purdue University Department of Computer Sciences
Hellerstein JM, Avnur R (2000) Informix under control: Online query Processing, Data Mining and Knowledge Discovery Journal
Herodotou H, Lim H, Luo G, Borisov N, DOng L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: CIDR
Jermaine C, Arumugam S, Pol A, Dobra A (2007) Scalable approximate query processing with the DBO engine. In: SIGMOD
Jiang D, Ooi DC, Shi L, Wu S (2010) The performance of mapreduce: an in-depth study. PVLDB 3(1)
Logothetis D, Yocum K (2008) Ad-hoc data processing in the cloud. Proc VLDB Endowment 1(2):1472–1475
Article Google Scholar
Matias Y, Urieli D (2005) Optimal workload-based weighted wavelet synopses, chapter in database theory - ICDT 2005, vol 3363, Lecture Notes in Computer Science. Springer, pp 368–382
Manju S, Punithavalli M (2011) An analysis of q-learning algorithms with strategies of reward function. Int J Comput Sci Eng (IJCSE) 3(2)
Mokbel M, Xiong X, Hammad M, Aref W (2005) Continuous query processing of spatio-temporal data streams in PLACE. Geoinformatics 9(4)
Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku GS, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the International Conference on Innovative Data Systems Research, CIDR
Pandey P, Pandey D, Kumar S (2010) Reinforcement learning by comparing immediate reward. Int J Comput Sci Inf Secur 8(5)
Pansare N, Borkar VR, Jermaine C, Condie T (2011) Online aggregation for large MapReduce jobs. In: PVLDB
Raman V, Raman B, Hellerstein JM (1999) Online dynamic reordering for interactive data processing. In: VLDB
Singh S, Singh N (2012) Big data analytics. In: Proceedings of the International Conference on Communication. Information and Computing Technology
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press
Yao Y, Gehrke J (2002) The cougar approach to in-network query processing in sensor networks. SIGMOD Rec 31(3)

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Thessaly, Lamia, 35100, Greece
Kostas Kolomvatsos
Department of Informatics and Telecommunications, University of Athens, Athens, 15784, Greece
Stathes Hadjiefthymiades

Authors

Kostas Kolomvatsos
View author publications
You can also search for this author in PubMed Google Scholar
Stathes Hadjiefthymiades
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kostas Kolomvatsos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolomvatsos, K., Hadjiefthymiades, S. Learning the engagement of query processors for intelligent analytics. Appl Intell 46, 96–112 (2017). https://doi.org/10.1007/s10489-016-0821-z

Download citation

Published: 26 July 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10489-016-0821-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning the engagement of query processors for intelligent analytics

Abstract

Access this article

Similar content being viewed by others

An intelligent scheme for assigning queries

Blink: Lightweight Sample Runs for Cost Optimization of Big Data Applications

QL-HEFT: a novel machine learning scheduling scheme base on cloud computing environment

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning the engagement of query processors for intelligent analytics

Abstract

Access this article

Similar content being viewed by others

An intelligent scheme for assigning queries

Blink: Lightweight Sample Runs for Cost Optimization of Big Data Applications

QL-HEFT: a novel machine learning scheduling scheme base on cloud computing environment

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation