Skip to main content
Log in

Learning the engagement of query processors for intelligent analytics

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Current applications require the processing of huge amounts of data produced by applications or end users personal devices. In such settings, intelligent analytics on top of large scale data are the key research subject for future data driven decision making. Due to the huge amount of data, analytics should be based on an efficient technique for querying big data partitions. Each partition contains only a part of the data and a processor is dedicated to execute queries for the corresponding partition. A Query Controller (QC) is responsible for managing continuous queries and returning the final outcome to users / applications by using the underlying processors. In this paper, we propose a learning scheme to be adopted by the QC for allocating each query to the available processors. We adopt the Q-learning algorithm to calculate the reward that the QC obtains for every allocation between queries and processors. The outcome is an efficient model that derives the optimal allocation for the incoming queries. We provide mathematical formulations for solving the discussed problem and present our simulation results. Through a large number of simulations, we reveal the advantages of the proposed model and give numerical results while comparing our framework with a baseline model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://hadoop.apache.org/.

  2. http://pig.apache.org/.

  3. http://hive.apache.org/.

References

  1. Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik SB (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2)

  2. Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. PVLDB 2(1)

  3. Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when you’re wrong: building fast and reliable approximate query processing systems. ACM SIGMOD, USA

    Google Scholar 

  4. Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2004) STREAM: the stanford data stream management system. Springer

  5. Boulougaris G, Kolomvatsos K, Hadjiefthymiades S (2010) Building the knowledge base of a buyer agent using reinforcement learning techniques. In: Proceedings of the 2010 IEEE World Congress on Computational Intelligence (WCCI 2010), IJCNN, Barcelona, Spain, pp 1166–1173

  6. Burges C (2009) Dimension reduction: a guided tour. Found Trends Mach Learn 2(4):275–365

    Article  MATH  Google Scholar 

  7. Chandramouli B, Goldstein J, Quamar A (2013) Scalable progressive analytics on big data in the cloud, vol 6

  8. Chandrasekaran S, Franklin MJ (2003) PSoup: a system for streaming queries over streaming data. VLDB J 12(2):140–156

    Article  Google Scholar 

  9. Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: SIGMOD

  10. S. Chaudhuri S, Das S, Datar G, Motwani M, Narasayya VR (2001) Overcoming limitations of sampling for aggregation queries. In: ICDE, pp 534–542

  11. Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: Proceedings of the 7th Conference on Networked Systems Design and Implementation

  12. Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD

  13. Dean J, Ghemawat S MapReduce: Simplified Data Processing on Large Clusters, Google Research, available at http://research.google.com/archive/mapreduce.html

  14. Dittrich J, Quiane-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah. PVLDB 3(1)

  15. Doucet A, Briers M, Senecal S (2006) Efficient block sampling strategies for sequential Monte Carlo methods, Journal of Computational and Graphical Statistics

  16. Gualtieri M, Yuhanna N (2014) The forrester wave: big data hadoop solutions, Technical Report

  17. Hammad M, Ghanem TM, Aref W, Elmagarmid AK, Mokbel M (2003) Efficient pipelined execution of sliding-window queries over data streams, Technical Report TR CSD-03-035, Purdue University Department of Computer Sciences

  18. Hellerstein JM, Avnur R (2000) Informix under control: Online query Processing, Data Mining and Knowledge Discovery Journal

  19. Herodotou H, Lim H, Luo G, Borisov N, DOng L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: CIDR

  20. Jermaine C, Arumugam S, Pol A, Dobra A (2007) Scalable approximate query processing with the DBO engine. In: SIGMOD

  21. Jiang D, Ooi DC, Shi L, Wu S (2010) The performance of mapreduce: an in-depth study. PVLDB 3(1)

  22. Logothetis D, Yocum K (2008) Ad-hoc data processing in the cloud. Proc VLDB Endowment 1(2):1472–1475

    Article  Google Scholar 

  23. Matias Y, Urieli D (2005) Optimal workload-based weighted wavelet synopses, chapter in database theory - ICDT 2005, vol 3363, Lecture Notes in Computer Science. Springer, pp 368–382

  24. Manju S, Punithavalli M (2011) An analysis of q-learning algorithms with strategies of reward function. Int J Comput Sci Eng (IJCSE) 3(2)

  25. Mokbel M, Xiong X, Hammad M, Aref W (2005) Continuous query processing of spatio-temporal data streams in PLACE. Geoinformatics 9(4)

  26. Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku GS, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the International Conference on Innovative Data Systems Research, CIDR

  27. Pandey P, Pandey D, Kumar S (2010) Reinforcement learning by comparing immediate reward. Int J Comput Sci Inf Secur 8(5)

  28. Pansare N, Borkar VR, Jermaine C, Condie T (2011) Online aggregation for large MapReduce jobs. In: PVLDB

  29. Raman V, Raman B, Hellerstein JM (1999) Online dynamic reordering for interactive data processing. In: VLDB

  30. Singh S, Singh N (2012) Big data analytics. In: Proceedings of the International Conference on Communication. Information and Computing Technology

  31. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press

  32. Yao Y, Gehrke J (2002) The cougar approach to in-network query processing in sensor networks. SIGMOD Rec 31(3)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kostas Kolomvatsos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolomvatsos, K., Hadjiefthymiades, S. Learning the engagement of query processors for intelligent analytics. Appl Intell 46, 96–112 (2017). https://doi.org/10.1007/s10489-016-0821-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0821-z

Keywords

Navigation