Abstract
A huge amount of data is generated each day from various sources. Analysis of these massive data is difficult, and requires new forms of processing to enable enhanced decision making, insight discovery and process optimization. In addition, besides their ever increasing volume, datasets change frequently, and as such, results to continuous queries have to be updated at short intervals. In this paper, we address the problem of evaluating continuous queries over big data streams that are frequently updated, adopting HIFUN, a high-level query language introduced recently. HIFUN offers a clear separation between the conceptual layer, where analytic queries are defined independently of the nature and location of data, and the physical layer where queries are evaluated, by encoding them as map-reduce jobs or as SQL group-by queries. Using HIFUN, we devise an algorithm for incremental processing of continuous queries, processing only the most recent data partition, and exploiting already computed information, without requiring evaluating the query over the complete dataset. Subsequently, we translate the generic algorithm to both SQL and MapReduce using SPARK, exploiting the query rewriting method provided by HIFUN. The experiments performed show the advantages of our solution in terms of query answering efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agathangelos, G., Troullinou, G., Kondylakis, H., Stefanidis, K., Plexousakis, D.: Incremental data partitioning of RDF Data in SPARK. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 11155, pp. 50–54. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98192-5_10
Agathangelos, G., Troullinou, G., Kondylakis, H., et al.: RDF Query answering using apache spark: review and assessment. In: ICDE Workshops, pp. 54–59 (2018)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2009)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 107–113 (2004)
Zaharia, M.A., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. Ann. Emerg. Med. 39(6), 691–692 (2002)
Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., Markl, V.: Benchmarking distributed stream data processing systems. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1507–1518 (2018). Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010)
Zaharia, M.A., Das, T., Li, D.H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: SOSP (2013)
Armbrust, M., et al.: Structured streaming: a declarative API for real-time applications in apache spark. In: SIGMOD Conference (2018)
Iqbal, M.S., Soomro, T.R.: Big data analysis: apache storm perspective. Int. J. Comput. Trends Technol. 19, 9–14 (2015)
Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng. Bull. 38, 28–38 (2015)
Akidau, T., et al.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8, 1792–1803 (2015)
Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Rec. 30, 109–120 (2001)
Gupta, A., Mumick, I.S.: Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1999)
Blakeley, J.A., Larson, P., Tompa, F.W.: Efficiently updating materialized views. ACM SIGMOD Rec. 15, 61–71 (1986)
Ahmad, Y., Kennedy, O., Koch, C., Nikolic, M.: DBToaster: higher-order delta processing for dynamic, frequently fresh views. PVLDB 5, 968–979 (2012)
Spyratos, N., Sugibuchi, T.: HIFUN - a high level functional query language for big data analytics. J. Intell. Inf. Syst. 51, 529–555 (2018). https://doi.org/10.1007/s10844-018-0495-6
Spyratos, N., Sugibuchi, T.: A high-level query language for big data analytics (2014)
Jesus, P., Baquero, C., Almeida, P.S.: A survey of distributed data aggregation algorithms. IEEE Commun. Surv. Tutorials 17, 381–404 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zervoudakis, P., Kondylakis, H., Plexousakis, D., Spyratos, N. (2020). Incremental Evaluation of Continuous Analytic Queries in HIFUN. In: Flouris, G., Laurent, D., Plexousakis, D., Spyratos, N., Tanaka, Y. (eds) Information Search, Integration, and Personalization. ISIP 2019. Communications in Computer and Information Science, vol 1197. Springer, Cham. https://doi.org/10.1007/978-3-030-44900-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-44900-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44899-8
Online ISBN: 978-3-030-44900-1
eBook Packages: Computer ScienceComputer Science (R0)