An Architecture for the Development of Distributed Analytics Based on Polystore Events

Zolotas, Athanasios; Barmpis, Konstantinos; Medhat, Fady; Neubauer, Patrick; Kolovos, Dimitris; Paige, Richard F.

doi:10.1007/978-3-030-71055-2_5

Athanasios Zolotas¹⁶,
Konstantinos Barmpis¹⁶,
Fady Medhat¹⁶,
Patrick Neubauer¹⁶,
Dimitris Kolovos¹⁶ &
…
Richard F. Paige^16,17

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12633))

Included in the following conference series:

VLDB Workshop on Data Management and Analytics for Medicine and Healthcare
VLDB Workshop on Polystore Systems for Heterogeneous Data in Multiple Databases with Privacy and Security Assurances

514 Accesses
1 Citations

Abstract

To balance the requirements for data consistency and availability, organisations increasingly migrate towards hybrid data persistence architectures (called polystores throughout this paper) comprising both relational and NoSQL databases. The EC-funded H2020 TYPHON project offers facilities for designing and deploying such polystores, otherwise a complex, technically challenging and error-prone task. In addition, it is nowadays increasingly important for organisations to be able to extract business intelligence by monitoring data stored in polystores. In this paper, we propose a novel approach that facilitates the extraction of analytics in a distributed manner by monitoring polystore queries as these arrive for execution. Beyond the analytics architecture, we presented a pre-execution authorisation mechanism. We also report on preliminary scalability evaluation experiments which demonstrate the linear scalability of the proposed architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
An example TyphonQL “select” query: from User u select u.age where u.id == 1.
2.
https://hub.docker.com/r/wurstmeister/zookeeper/.
3.
https://hub.docker.com/r/wurstmeister/kafka/.
4.
AMD Opteron(tm) Processor 4226 – 6-cores @ 2.7 GHz, \(4 \times 16\) GB DD3 1066 MHz RAM.

References

Confluent Inc.: Confluent: Apache Kafka and Event Streaming Platform for Enterprise. https://www.confluent.io/
Confluent.io: Kafka Connect. https://docs.confluent.io/current/connect/index.html
Debezium Community: Debezium. https://debezium.io/
Garg, N.: Apache Kafka. Packt Publishing Ltd., Birmingham (2013)
Google Scholar
Hueske, F., Kalavri, V.: Stream Processing with Apache Flink: Fundamentals, Implementation, and Operation of Streaming Applications. O’Reilly Media, Newton (2019)
Google Scholar
Kolovos, D., et al.: Domain-specific languages for the design, deployment and manipulation of heterogeneous databases. In: 2019 IEEE/ACM 11th International Workshop on Modelling in Software Engineering (MiSE), pp. 89–92. IEEE (2019)
Google Scholar
Oracle Corporation: Real-time access to realtime Information, Oracle White Paper (2015)
Google Scholar
Rooney, S., et al.: Kafka: the database inverted, but not garbled or compromised. In: 2019 IEEE International Conference on Big Data (Big Data), pp. 3874–3880. IEEE (2019)
Google Scholar
Strimzi: Strimzi - Apache Kafka on Kubernetes. https://strimzi.io/
The Apache Software Foundation: Apache Flink Clusters and Deployment. https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/
The Apache Software Foundation: Apache Flink Side Outputs. https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html
ZenDesk: Maxwell’s Daemon. https://maxwells-daemon.io/

Download references

Acknowledgements

This work is funded by the European Union Horizon 2020 TYPHON project (#780251).

Author information

Authors and Affiliations

Department of Computer Science, University of York, York, UK
Athanasios Zolotas, Konstantinos Barmpis, Fady Medhat, Patrick Neubauer, Dimitris Kolovos & Richard F. Paige
Department of Computer Science, McMaster University, Hamilton, Canada
Richard F. Paige

Authors

Athanasios Zolotas
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Barmpis
View author publications
You can also search for this author in PubMed Google Scholar
Fady Medhat
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Neubauer
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris Kolovos
View author publications
You can also search for this author in PubMed Google Scholar
Richard F. Paige
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Athanasios Zolotas .

Editor information

Editors and Affiliations

Massachusetts Institute of Technology, Lexington, MA, USA
Vijay Gadepally
Intel Corporation, Portland, OR, USA
Timothy Mattson
Massachusetts Institute of Technology, Cambridge, MA, USA
Michael Stonebraker
Massachusetts Institute of Technology, Cambridge, MA, USA
Tim Kraska
Stony Brook University, Stony Brook, NY, USA
Fusheng Wang
University of Washington, Seattle, WA, USA
Gang Luo
Georgia State University, Atlanta, GA, USA
Jun Kong
Lucerne Unviersity of Applied Sciences, Rotkreuz, Switzerland
Alevtina Dubovitskaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zolotas, A., Barmpis, K., Medhat, F., Neubauer, P., Kolovos, D., Paige, R.F. (2021). An Architecture for the Development of Distributed Analytics Based on Polystore Events. In: Gadepally, V., et al. Heterogeneous Data Management, Polystores, and Analytics for Healthcare. DMAH Poly 2020 2020. Lecture Notes in Computer Science(), vol 12633. Springer, Cham. https://doi.org/10.1007/978-3-030-71055-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-71055-2_5
Published: 04 March 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-71054-5
Online ISBN: 978-3-030-71055-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Architecture for the Development of Distributed Analytics Based on Polystore Events