Skip to main content
Log in

An intelligent scheme for assigning queries

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Analytics provided on top of large scale data streams are the key research subject for future decision making applications. The huge volumes of data make their partitioning imperative to efficiently support novel applications. Such applications should be based on intelligent, efficient methods for querying multiple data partitions. A processor is placed in front of each partition dedicated to manage/execute queries for the specific piece of data. Continuous queries over these data sources require intelligent mechanisms to result the final outcome (query response) in the minimum time with the maximum performance. This paper proposes a mechanism for handling the behavior of an entity that undertakes the responsibility of handling the incoming queries. Our mechanism adopts a time-optimized scheme for selecting the appropriate processor(s) for each incoming query through the use of the Odds algorithm. We try to result the optimal assignment, i.e., queries to processors, in the minimum time while maximizing the performance. We provide mathematical formulations for describing the discussed problem and present simulation results and a comparative analysis. Through a large number of experiments, we reveal the advantages of the model and give numerical results comparing it with a deterministic model as well as with other efforts in the domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://www.ibm.com/support/knowledgecenter/en/SSDP9S_{1}1.1.0/com.ibm.swg.im.iis.fed.classic.overview.doc/topics/iiyfcstoqp.html

References

  1. Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik SB (2003) Aurora: a new model and architecture for data stream management. VLDB J 12(2):120–139

    Article  Google Scholar 

  2. Abouzeid A, Bajda-Pawlikowski K, Abadi DJ, Rasin A, Silberschatz A (2009) HadoopDB: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB 2(1):922–933

    Google Scholar 

  3. Agarwal S, Milner H, Kleiner A, Talwalkar A, Jordan M, Madden S, Mozafari B, Stoica I (2014) Knowing when youre wrong: building fast and reliable approximate query processing systems. ACM SIGMOD, USA

    Google Scholar 

  4. Ailamaki A, Pandis I (2009) Query processor, encyclopedia of database systems. Springer, Berlin, pp 2307–2308

    Google Scholar 

  5. Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2004) STREAM: the Stanford data stream management system. Springer, Berlin

    Google Scholar 

  6. Awais A, Paul A, Din S, Rathore MM, Choi GS, Jeon G (2017) Multilevel data processing using parallel algorithms for analyzing big data in high-performance computing. Int J Parallel Prog:1–20

  7. Balkensen C, Tatbul N (2011) Scalable data partitioning techniques for parallel sliding window processing over data streams. In: Proceedings of 8th International Workshop on Data Management for Sensor Networks, Seattle, WA, USA

  8. Bruss T (2000) Sum the odds to one and stop. Ann Probab 28(3)

  9. Bruss T, Louchard G (2009) The odds-algorithm based on sequential updating and its performance. Adv Appl Probab 41(1):131–153

    Article  MathSciNet  MATH  Google Scholar 

  10. Cao L, Rundensteiner EA (2013) High performance stream query processing with correlation-aware partitioning. In: Proceedings of the VLDB Endowment, vol 7(4), Hangzhou, China, pp 265–276

  11. Chandramouli B, Goldstein J, Quamar A (2013) Scalable progressive analytics on big data in the cloud. In: Proceedings of the VLDB Endowment, vol 6(14)

  12. Chandrasekaran S, Franklin MJ (2003) PSOup: a system for streaming queries over streaming data. VLDB J 12(2):140–156

    Article  Google Scholar 

  13. Chaudhuri S, Das G, Srivastava U (2004) Effective use of block-level sampling in statistics estimation. In: SIGMOD

  14. Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. In: Proceedings of the 7th Conference on Networked Systems Design and Implementation

  15. Cranor C, Johnson T, Spataschek O, Shkapenyuk V (2003) Gigascope: a stream database for network applications. In: Proceedings of the ACM International Conference on Management of Data SIGMOD

  16. Dittrich J, Quiane-Ruiz JA, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah. PVLDB 3(1):518–529

    Google Scholar 

  17. Doucet A, Briers M, Senecal S (2006) Efficient block sampling strategies for sequential Monte Carlo methods. J Comput Graph Stat 15(3):693–711

    Article  MathSciNet  Google Scholar 

  18. Erra U, Senatore S, Minnella F, Caggianese G (2015) Approximate TF-IDF based on topic extraction from massive message stream using the GPU. Inf Sci 292:143–161

    Article  Google Scholar 

  19. Fengguang S, Dongarra J (2015) A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems. Concurrency Comput: Pract Experience 27.14:3702–3723

    Google Scholar 

  20. Ferguson TS (2014) Optimal Stopping and Applications, Mathematics Department, UCLA, Available online http://www.math.ucla.edu/tom/Stopping/Contents.html, accessed March

  21. Gedik B (2014) Partitioning functions for stateful data parallelism in stream processing. VLDB J 23(4):517–539

    Article  Google Scholar 

  22. Gedik B, Schneider S, Hirzel M, Wu K-L (2014) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463

    Article  Google Scholar 

  23. Hameurlain A, Morvan F (2009) Evolution of query optimization methods, transactions on large-scale data- and knowledge-centered systems i. Springer, Berlin, pp 211–242

    Google Scholar 

  24. Hammad M, Ghanem TM, Aref W, Elmagarmid AK, Mokbel M (2003) Efficient pipelined execution of sliding-window queries over data streams, technical report TR CSD-03-035 Purdue University Department of Computer Sciences

  25. Han J, Kamber M, Pei J (2012) Data mining, concepts and techniques, 3rd Edition. Elsevier, Amsterdam

    MATH  Google Scholar 

  26. Hellerstein JM, Avnur R (2000) Informix under control: online query processing. Data Mining and Knowledge Discovery Journal

  27. Herodotou H, Lim H, Luo G, Borisov N, DOng L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics in CIDR

  28. Jermaine C, Arumugam S, Pol A, Dobra A (2007) Scalable approximate query processing with the DBO engine. In: SIGMOD

  29. Jiang D, Ooi DC, Shi L, Wu S (2010) The performance of MapReduce: an in-depth study. PVLDB 3(1):472–483

    Google Scholar 

  30. Jones M, Marron J, Sheather S (1996) A brief survey of bandwidth selection for density estimation. Am Stat Assoc 91:401–407

    Article  MathSciNet  MATH  Google Scholar 

  31. Kolomvatsos K, Anagnostopoulos C (2017) Reinforcement machine learning for predictive analytics in smart cities, informatics. MDPI 4:16

    Google Scholar 

  32. Kolomvatsos K, Hadjiefthymiades S (2017) Learning the engagement of query processors for intelligent analytics. Springer Appl Intell J 46(1):96–112, 1–17

    Article  Google Scholar 

  33. Logothetis D, Yocum K (2008) Ad-hoc data processing in the cloud. Proc VLDB Endowment 1(2):1472–1475

    Article  Google Scholar 

  34. Mokbel M, Xiong X, Hammad M, Aref W (2005) Continuous Query Processing of Spatio-Temporal Data Streams in PLACE. Geoinformatics 9(4):343–365

    Article  Google Scholar 

  35. Motwani R, Widom J, Arasu A, Babcock B, Babu S, Datar M, Manku GS, Olston C, Rosenstein J, Varma R (2003) Query processing, approximation, and resource management in a data stream management system. In: Proceedings of the International Conference on Innovative Data Systems Research CIDR

  36. Ozgu MT, Valduriez P Overview of Query Processing, Principles of Distributes Database Systems, 3rd Edition, 20111, pp. 205–220

  37. Pansare N, Borkar VR, Jermaine C, Condie T (2011) Online aggregation for large MapReduce jobs. In: PVLDB

  38. Peskir G, Shiryaev A (2006) Optimal stopping and free boundary problems. ETH Zuerich, Birkhauser

    MATH  Google Scholar 

  39. Rahman Md W, Lu X, Islam NS, Panda DK (2014) HOMR: A hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM International Conference on Supercomputing (ICS 14). ACM, New York, pp 33–42

  40. Raman V, Raman B, Hellerstein JM (1999) Online dynamic reordering for interactive data processing. In: VLDB

  41. Raykar C, Duraiswami R (2006) Fast optimal bandwidth selection for kernel density estimation, SIAM International Conference on Data Mining

  42. Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

    Book  MATH  Google Scholar 

  43. Singh S, Singh N (2012) Big data analytics. In: Proceedings of the International Conference on Communication, Information and Computing Technology

  44. Wand MP, Jones M (1995) C. Kernel Smoothing, Chapman and Hall

    Google Scholar 

  45. Yao Y, Gehrke J (2002) The Cougar approach to in-network query processing in sensor networks. SIGMOD Record 31(3):9–18

    Article  Google Scholar 

  46. Zeitler E, Risch T (2010) Scalable splitting of massive data streams. In: Kitagawa H, Ishikawa Y, Li Q, Watanabe C (eds) Database Systems for Advanced Applications, DASFAA 2010, Lecture Notes in Computer Science, vol 5982. Springer, Berlin

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kostas Kolomvatsos.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolomvatsos, K. An intelligent scheme for assigning queries. Appl Intell 48, 2730–2745 (2018). https://doi.org/10.1007/s10489-017-1099-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-017-1099-5

Keywords

Navigation