Abstract
Spatiotemporal data analysis plays a vital role in big data processing, and it is also a research hotspot in location-aware and recommender systems. In these applications, graph modeling and distributed iterative computing are the basis and guarantee for data query and mining. Because of the constant repeated execution of specific calculation logic, iterative jobs have the characteristics of being time-consuming and exerting high pressure on system resources. However, iterative jobs always face the risk of stopping due to computing node fault, which in turn causes serious economic losses. At present, the latest generation of distributed computing system Flink’s recovery strategy for node faults in batch processing mode is to restart the job from the beginning, which is extremely time-consuming. If the checkpoint mechanism in Flink’s stream-processing mode is used to recover from batch jobs failures, it will greatly increase the running time and storage overhead in trouble-free state. Therefore, a lightweight fault-tolerant mechanism is needed to reduce failure recovery time while ensuring the job efficiency of spatiotemporal data analysis. In view of the above situation and the characteristics of the iterative computing model for graph computing, a single-node failure recovery mechanism only for the failed node is proposed, which reduces the failure recovery time by introducing lightweight checkpoints and local logs. Based on the proposed single-node failure recovery mechanism, a failure recovery mechanism under multi-node fault and associated fault is proposed, which can cope with more complex failure situations occurs. Experimental results show that the proposed method can quickly and effectively recover jobs after failure, reducing the average recovery time by 37% in the case of single node fault, and reducing the average recovery time by 24% in the case of multi-node fault.











Similar content being viewed by others
Data availability
All data in the experiment is authoritative and available.
Code availability
All the codes in this research are available.
References
Cantarella, G.E., Improta, G., Sforza, A.: Iterative procedure for equilibrium network traffic signal setting. Transportation Research Part A General 25(5), 241–249 (1991)
Carbone, P., Katsifodimos, A., Kth, ., Sweden, S., Tzoumas, K.: Apache flink : Stream and batch processing in a single engine (2015)
Chandy, K.M., Lamport, L.: Distributed snapshots: Determining global states of a distributed system. Acm Trans on Computer Systems 3(1), 63–75 (2016)
Chen, L., Shang, S., Jensen, C.S., Yao, B., Zhang, Z., Shao, L.: Effective and efficient reuse of past travel behavior for route recommendation. KDD, 488–498 (2019)
Chen, D., Yuan, Y., Du, W., Cheng, Y., Wang, G.: Online route planning over time-dependent road networks. In: ICDE, pp. 325– 335. IEEE, ??? (2021)
Deo, N., Pang, C.Y.: Shortest path algorithms: a taxonomy and annotation (1984)
Dijkstra, E.W.: The distributed snapshot of k.m. chandy and l. lamport. Springer, Berlin (1986)
Doan, H., Zhang, W., Min, Z., Ogata, K.: Model checking chandy-lamport distributed snapshot algorithm revisited. In: International Symposium on Dependable Computing & Internet of Things (2016)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. AAAI Press (1996)
Failure, H., Failure, H., Access, S.D., Access, S.D., Sets, L.D., Sets, L.D., Model, S.C., Model, S.C., Computation, M., Computation, M.: The hadoop distributed file system: Architecture and design. Hadoop Project Website 11(11), 1–10 (2007)
Fang, Y., Feng, C., Mammar, S., Che, A.: Iterative algorithm for lane reservation problem on transportation network. In: IEEE International Conference on Networking (2011)
Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28(1) (1979)
Iserngonzalez, J., Hernandezsosa, D., Fernandezperdomo, E., Cabreragamez, J., Dominguezbrito, A.C., Prietomaranon, V.: Path planning for underwater gliders using iterative optimization. In: IEEE International Conference on Robotics & Automation (2011)
Javed, M.A., Younis, M.S., Latif, S., Qadir, J., Baig, A.: Community detection in networks: A multidisciplinary review. Journal of Network and Computer Applications 108(APR.), 87–111 (2018)
Kambhatla, S., Walpole, J.: Recovery with limited replay: fault-tolerant processes in linda. In: Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, 1990. (1991)
Kunegis, J.: Handbook of network analysis [konect - the koblenz network collection]. Computer Science 2, 1343–1350 (2014)
Lou, Y.S., Zhang, W.Y., Xu, F., Wang, Y., Chen, S.: Parallel implementation of single-source shortest path algorithm based on haloop. Applied Mechanics & Materials 220–223, 2428–2432 (2012)
Luo, W.: A real-time fault-tolerant scheduling algorithm for distributed systems based on deferred active backup-copy. Journal of Computer Research and Development 44(44), 521–528 (2007)
Patriksson, M.: The traffic assignment problem: Models and methods. VSP (1994)
Pfoser, D., Tryfona, N., Jensen, C.S.: Indeterminacy and spatiotemporal data: Basic definitions and case study. Geoinformatica 9(3), 211–236 (2005)
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. EDBT, 156–167 (2012)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Wen, J., Kalnis, P.: Collective travel planning in spatial networks. IEEE Trans. Knowl. Data Eng. 28(5), 1132–1146 (2016)
Shang, S., Chen, L., Jensen, C.S., Wen, J., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29(7), 1549–1562 (2017)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. Proc. VLDB Endow. 10(11), 1178–1189 (2017)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Parallel trajectory similarity joins in spatial networks. VLDB J. 27(3), 395–420 (2018)
Shang, S., Chen, L., Zheng, K., Jensen, C.S., Wei, Z., Kalnis, P.: Parallel trajectory-to-location join. IEEE Trans. Knowl. Data Eng. 31(6), 1194–1207 (2019)
Tian C, Hu Z, Vora K, Gupta R: Coral: Confined recovery in distributed asynchronous graph processing. In: Acm Sigplan Notices A Monthly Publication of the Special Interest Group on Programming Languages (2017)
Venkateswara, R.K.: Spatiotemporal data mining: Issues, tasks and applications. International Journal of Computer Science & Engineering Survey 3(1), 39–52 (2012)
Wang, Y., Yuan, Y., Wang, H., Zhou, X., Mu, C., Wang, G.: Constrained route planning over large multi-modal time-dependent networks. ICDE, 313–324 (2021)
Xu, C., Holzemer, M., Kaul, M., Soto, J., Markl, V.: On fault tolerance for distributed iterative dataflow processing. IEEE Transactions on Knowledge and Data Engineering PP, 1–1 (2017)
Yuan, Y., Lian, X., Wang, G., Chen, L., Ma, Y., Wang, Y.: Weight-constrained route planning over time-dependent graphs. ICDE, 914–925 (2019)
Yuan, Y., Lian, X., Chen, L., Wang, G., Yu, J.X., Wang, Y., Ma, Y.: Gcache: Neighborhood-guided graph caching in a distributed environment. IEEE Trans. Parallel Distributed Syst. 30(11), 2463–2477 (2019)
Yuan, Y., Lian, X., Wang, G., Ma, Y., Wang, Y.: Constrained shortest path query in a large time-dependent graph. Proc. VLDB Endow. 12(10), 1058–1070 (2019)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets (2010)
Zhang, Y., Gao, Q., Gao, L., Wang, C.: Imapreduce: A distributed computing framework for iterative computation. Journal of Grid Computing 10(1), 47–68 (2012)
Zhu, P., Yang, F., Tu, G.: Real-time fault-tolerant scheduling for distributed systems based on improving priority of passive backup. Journal of Computer Research and Development 47(11), 2003–2010 (2010)
Funding
This research was supported by the National Key R&D Program of China under Grant No. 2018YFB1004402; and the NSFC under Grant No. 61872072, 62072087, 61772124, 61932004, 61732003, and 61729201; and the Fundamental Research Funds for the Central Universities under Grant No. N2016009.
Author information
Authors and Affiliations
Contributions
conceptualization, Hangxu Ji; software, Hangxu Ji; methodology, Hangxu Ji and Yuhai Zhao; supervision, Hangxu Ji and Liuguo Wei; validation, Yuchen Fan; writing-original draft, Hangxu Ji; writing-review and editing, Gang Wu and Guoren Wang.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no conflict of interest.
Ethics approval
This article does not contain any studies involving human participants and/or animals by any of the authors.
Consent to participate
All authors have agreed to participate in the research described in this manuscript.
Consent for publication
All authors have read and agreed to the published version of the manuscript.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Spatiotemporal Data Management and Analytics for Recommend
Guest Editors: Shuo Shang, Xiangliang Zhang, and Panos Kalnis
Rights and permissions
About this article
Cite this article
Ji, H., Wu, G., Zhao, Y. et al. A fault-tolerant optimization mechanism for spatiotemporal data analysis in flink. World Wide Web 26, 867–887 (2023). https://doi.org/10.1007/s11280-022-01006-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-022-01006-5