PhoeniQ: Failure-Tolerant Query Processing in Multi-node Environments

Bessho, Yutaro; Hayamizu, Yuto; Goda, Kazuo; Kitsuregawa, Masaru

doi:10.1007/978-3-030-59003-1_5

PhoeniQ: Failure-Tolerant Query Processing in Multi-node Environments

Conference paper
First Online: 14 September 2020

987 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12391))

Abstract

Parallel processing is a flagship approach for answering analytical queries on large-scale database. As the database scale increases, a larger number of processing nodes are likely to be incorporated to increase the degree of parallelism. However, this solution results in an increased probability of node failure. If such a failure happens during query processing, the processing often has to restart from scratch. This temporal cost may not be acceptable for the user. In this paper, we propose PhoeniQ, a fault-tolerant query processing mechanism for analytical parallel database systems. PhoeniQ takes a package-level checkpoint for every operator pipeline and replicates the output of stateful operators among different processing nodes. If a single processing node fails during processing, another node is enabled to resume the execution state of the failed node, so that the query can continue to run. This paper presents our intensive experiments based on our prototype, which demonstrate that PhoeniQ can continue the query processing in the face of node failures with significantly smaller cost than the conventional approach.

Y. Bessho—Currently, he works for NTT.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The idea of PhoeniQ can be easily extended to a shared-nothing architecture [26]. Due to the space limitation, we will present further discussion in a separate paper.
2.
For simplicity and due to the space limitation, this paper merely presumes a single-node crash failure of processing nodes. The same idea can be easily applied to other cases, such as a double-node failure. Another exploration is necessary to protect against a failure of the storage node.
3.
As long as all the non-tail operators are stateless as we have assumed, the reprocessing causes only marginal overhead compared to the entire pipeline processing.

References

Oracle Berkeley DB. https://www.oracle.com/database/berkeley-db/db.html
The Internet of Things: Data from Embedded Systems Will Account for 10% of the Digital Universe by 2020. https://www.emc.com/leadership/digital-universe/2014iview/internet-of-things.htm
The TPC-H benchmark. http://www.tpc.org/tpch/
Abadi, D.J., et al.: The design of the borealis stream processing engine. In: Proceedings CIDR, pp. 277–289 (2005)
Google Scholar
Boral, H., et al.: Prototyping bubba, a highly parallel database system. IEEE Trans. Knowl. Data Eng. 2(1), 4–24 (1990)
Article Google Scholar
Borthakur, D.: Petabyte scale databases and storage systems at facebook. In: Proceedings SIGMOD, pp. 1267–1268 (2013)
Google Scholar
Carney, D., et al.: Monitoring streams - a new class of data management applications. In: Proceedings VLDB, pp. 215–226 (2002)
Google Scholar
Chandramouli, B., Bond, C.N., Babu, S., Yang, J.: Query suspend and resume. In: Proceedings SIGMOD, pp. 557–568 (2007)
Google Scholar
Chandrasekaran, S., et al.: Telegraphcq: continuous dataflow processing for an uncertain world. In: Proceedings CIDR (2003)
Google Scholar
Chaudhuri, S., Kaushik, R., Ramamurthy, R., Pol, A.: Stop-and-restart style execution for long running decision support queries. In: Proceedings VLDB, pp. 735–745 (2007)
Google Scholar
Daniel Weeks: Netflix: Integrating Spark at petabyte scale. https://conferences.oreilly.com/strata/big-data-conference-ny-2015/public/schedule/detail/43373
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
DeWitt, D.J., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
Article Google Scholar
DeWitt, D.J., Madden, S., Stonebraker, M.: How to build a high-performance data warehouse how to build a high-performance data warehouse. http://db.csail.mit.edu/madden/high_perf.pdf
Ghandeharizadeh, S., DeWitt, D.J.: Hybrid-range partitioning strategy: a new declustering strategy for multiprocessor database machines. In: Proceedings VLDB, pp. 481–492 (1990)
Google Scholar
Goda, K., Tamura, T., Oguchi, M., Kitsuregawa, M.: Run-time load balancing system on san-connected PC cluster for dynamic injection of CPU and disk resource - a case study of data mining application. Proc. DEXA. 2453, 182–192 (2002)
MATH Google Scholar
Han, B., Omiecinski, E., Mark, L., Liu, L.: OTPM: failure handling in data-intensive analytical processing. In: Proceedings CollaborateCom, pp. 35–44. IEEE (2011)
Google Scholar
Hauglid, J.O., Nørvåg, K.: Proqid: partial restarts of queries in distributed databases. In: Proceedings CIKM, pp. 1251–1260. ACM (2008)
Google Scholar
Hwang, J., Xing, Y., Çetintemel, U., Zdonik, S.B.: A cooperative, self-configuring high-availability solution for stream processing. In: Proceedings ICDE, pp. 176–185 (2007)
Google Scholar
Jeff Barr: Migration Complete - Amazon’s Consumer Business Just Turned off its Final Oracle Database. https://aws.amazon.com/blogs/aws/migration-complete-amazons-consumer-business-just-turned-off-its-final-oracle-database/
Kwon, Y., Balazinska, M., Greenberg, A.G.: Fault-tolerant stream processing using a distributed, replicated file system. Proc. VLDB 1(1), 574–585 (2008)
Article Google Scholar
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings SIGMOD, pp. 165–178 (2009)
Google Scholar
Reza, S.: Uber’s Big Data Platform: 100+ Petabytes with Minute Latency. https://eng.uber.com/uber-big-data-platform/
Shah, M.A., Hellerstein, J.M., Brewer, E.: Highly available, fault-tolerant, parallel dataflows. In: Proceedings SIGMOD, pp. 827–838. ACM (2004)
Google Scholar
Smith, J.E.T., Watson, P.: A rollback-recovery protocol for wide area pipelined data flow computations (2004)
Google Scholar
Stonebraker, M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1985)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Tokyo, 7–3–1 Hongo, Bunkyo-ku, Tokyo, Japan
Yutaro Bessho, Yuto Hayamizu, Kazuo Goda & Masaru Kitsuregawa
National Institute of Informatics, 2–1–2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan
Masaru Kitsuregawa

Authors

Yutaro Bessho
View author publications
You can also search for this author in PubMed Google Scholar
Yuto Hayamizu
View author publications
You can also search for this author in PubMed Google Scholar
Kazuo Goda
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yutaro Bessho .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
IFS, Vienna University of Technology, Vienna, Wien, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bessho, Y., Hayamizu, Y., Goda, K., Kitsuregawa, M. (2020). PhoeniQ: Failure-Tolerant Query Processing in Multi-node Environments. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-59003-1_5
Published: 14 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59002-4
Online ISBN: 978-3-030-59003-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics