Abstract
As a non-linear data structure consisting of nodes and edges, the graph data span many different domains. In the real world, applications based on such data structures are always time-sensitive, that is, the value of graph data tends to decrease with time. Furthermore, the application based on spatio-temporal graph is one of the typical representatives of time-sensitive, since the time dimension is an inherent feature of spatio-temporal data. The Distributed Stream Processing Engine (DSPE) seems an excellent choice for the above requirement, which is commonly partitioned and concurrently processed by a number of threads to maximize the throughput. However, it is not feasible to do such mission directly using the traditional DSPE. In this paper, we propose a computational model suitable for handling the spatio-temporal graph in DSPE, by reconstructing the DSPE’s parallel processing slots. Specifically, our proposal includes a general processing framework to deal with the data structure of the spatio-temporal graph, a state information compensation mechanism to ensure the correctness of processing such stateful operation in DSPE, a lightweight summary information calculation method to ensure the performance of the system. Empirical studies on real-world stream applications validate the usefulness of our proposals and prove the considerable advantage of our approaches over state-of-the-art solutions in the literature.
Similar content being viewed by others
References
Apache Flink Project. http://flink.apache.org/
Apache Spark Project. http://spark.apache.org/
Apache Storm Project. http://storm.apache.org/
Bakalov, P., Hadjieleftheriou, M., Keogh, E., Tsotras, V.J.: Efficient trajectory joins using symbolic representations. In: Proceedings of the 6th international conference on Mobile data management, pages 86–93. ACM (2005)
Bakalov, P., Hadjieleftheriou, M., Tsotras, V.J.: Time relaxed spatiotemporal trajectory joins. In: Proceedings of the 13th Annual ACM International Workshop on Geographic Information Systems, pp 182–191. ACM (2005)
Balkesen, C., Tatbul, N.: Scalable data partitioning techniques for parallel sliding window processing over data streams. In: International Workshop on Data Management for Sensor Networks (DMSN) (2011)
Bruno, N., Kwon, Y., Wu, M.-C.: Advanced join strategies for large-scale distributed computation. Proc. VLDB Endow. 7(13), 1484–1495 (2014)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, pp 206–215. ACM (2004)
Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp 491–502. ACM (2005)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: The count-min sketch and its applications. J. Algor. 55(1), 58–75 (2005)
Ding, J., Fang, J., Zhang, Z., Zhao, P., Xu, J., Zhao, L.: Real-time trajectory similarity processing using longest common subsequence. In: Proceedings of the 21st High Performance Computing and Communications. IEEE (To appear)
Dubuisson, M.-P., Jain, A.K.: A modified Hausdorff distance for object matching. In: Proceedings of 12th International Conference on Pattern Recognition, vol. 1, pp 566–568. IEEE (1994)
Eiter, T., Mannila, H.: Computing Discrete Fréchet Distance. Technical report, Citeseer (1994)
Gedik, B.: Partitioning functions for stateful data parallelism in stream processing. VLDB J. Int. J. Very Large Data Bases 23(4), 517–539 (2014)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14), pp 599–613 (2014)
Haan, H., Streb, J., Bien, S., Rösler, F.: Individual cortical current density reconstructions of the semantic n400 effect: Using a generalized minimum norm model with different constraints (l1 and l2 norm). Hum. Brain Mapp. 11(3), 178–192 (2000)
Ji, S., Mittal, P., Beyah, R.: Graph data anonymization, de-anonymization attacks, and de-anonymizability quantification: a survey. IEEE Commun. Surveys Tutor. 19(2), 1305–1326 (2017)
Li, L., Zheng, K., Wang, S., Hua, W., Zhou, X.: Go slow to go fast: Minimal on-road time route scheduling with parking facilities using historical trajectory. VLDB J. Int. J. Very Large Data Bases 27(3), 321–345 (2018)
Lian, D., Zheng, K., Ge, Y., Cao, L., Chen, E., Xie, X.: Geomf++: Scalable location recommendation via joint geographical modeling and matrix factorization. ACM Trans. Inf. Syst. (TOIS) 36(3), 33 (2018)
Liu, G., Liu, Y., Zheng, K., Liu, A., Li, Z., Wang, Y., Zhou, X.: Mcs-gpm: Multi-constrained simulation based graph pattern matching in contextual social graphs. IEEE Trans. Knowl. Data Eng. 30(6), 1050–1064 (2017)
Nasir, M.A.U., Morales, G.D.F., Garcia-Soriano, D., Kourtellis, N., Serafini, M.: The power of both choices: Practical load balancing for distributed stream processing engines. In: 2015 IEEE 31st International Conference on Data Engineering, pp 137–148. IEEE (2015)
Nasir, M.A.U., Morales, G.D.F., Kourtellis, N., Serafini, M.: When two choices are not enough: Balancing at scale in distributed stream processing. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp 589–600. IEEE (2016)
Paterson, M., Dančík, V.: Longest common subsequences. In: International Symposium on Mathematical Foundations of Computer Science, pp 127–142. Springer (1994)
Rivetti, N., Querzoni, L., Anceaume, E., Busnel, Y., Sericola, B.: Efficient key grouping for near-optimal load balancing in stream processing systems. In: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems, pp 80–91. ACM (2015)
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: Proceedings of the 15th International Conference on Extending Database Technology, pp 156–167. ACM (2012)
Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. Proc. VLDB Endow. 10(11), 1178–1189 (2017)
Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp 333–342. ACM (2014)
Vitorovic, A., Elseidy, M., Koch, C.: Load balancing and skew resilience for parallel joins. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp 313–324. IEEE (2016)
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing multi-dimensional time-series with support for multiple distance measures. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 216–225. ACM (2003)
Wang, H., Su, H., Zheng, K., Sadiq, S., Zhou, X.: An effectiveness study on trajectory similarity measures. In: Proceedings of the Twenty-Fourth Australasian Database Conference, vol. 137, pp 13–22. Australian Computer Society, Inc. (2013)
Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)
Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp 1043–1052. ACM (2008)
Yi, B.-K., Jagadish, H., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings 14th International Conference on Data Engineering, pp 201–208. IEEE (1998)
Yin, H., Zhou, X., Cui, B., Wang, H., Zheng, K., Nguyen, Q.V.H.: Adapting to user interest drift for poi recommendation. IEEE Trans. Knowl. Data Eng. 28(10), 2566–2581 (2016)
Yu, H., Li, H.-G., Wu, P., Agrawal, D., El Abbadi, A.: Efficient processing of distributed top-k queries. In: International Conference on Database and Expert Systems Applications, pp 65–74. Springer (2005)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Zeinalipour-Yazti, D., Vagena, Z., Gunopulos, D., Kalogeraki, V., Tsotras, V., Vlachos, M., Koudas, N., Srivastava, D.: The threshold join algorithm for top-k queries in distributed sensor networks. In: Proceedings of the 2nd International Workshop on Data Management for Sensor Networks, pp 61–66. ACM (2005)
Zeinalipour-Yazti, D., Lin, S., Gunopulos, D.: Distributed spatio-temporal similarity search. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp 14–23. ACM (2006)
Zhao, Y., Zheng, K., Li, Y., Su, H., Liu, J., Zhou, X.: Destination-aware task assignment in spatial crowdsourcing: A worker decomposition approach. IEEE Transactions on Knowledge and Data Engineering (2019)
Zheng, K., Shang, S., Yuan, N.J., Yang, Y.: Towards efficient search for activity trajectories. In: 2013 IEEE 29Th International Conference on Data Engineering (ICDE), pp 230–241. IEEE (2013)
Zheng, K., Zheng, Y., Yuan, N.J., Shang, S., Zhou, X.: Online discovery of gathering patterns over trajectories. IEEE Trans. Knowl. Data Eng. 26(8), 1974–1988 (2013)
Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., Zhou, X.: Interactive top-k spatial keyword queries. In: 2015 IEEE 31st International Conference on Data Engineering, pp 423–434. IEEE (2015)
Zheng, B., Su, H., Hua, W., Zheng, K., Zhou, X., Li, G.: Efficient clue-based route search on road networks. IEEE Trans. Knowl. Data Eng. 29 (9), 1846–1859 (2017)
Zheng, K., Zhao, Y., Lian, D., Zheng, B, Liu, G., Zhou, X.: Reference-based framework for spatio-temporal trajectory compression and query processing. IEEE Transactions on Knowledge and Data Engineering (2019)
Acknowledgments
This work is partially supported by NSFC (No.61802273), the Postdoctoral Science Foundation of China under Grant (No. 2017M621813), the Postdoctoral Science Foundation of Jiangsu Province of China under Grant (No. 2018K029C), and the Natural Science Foundation for Colleges and Universities in Jiangsu Province of China under Grant (No. 18KJB520044).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Graph Data Management in Online Social Networks
Guest Editors: Kai Zheng, Guanfeng Liu, Mehmet A. Orgun, and Junping Du
Rights and permissions
About this article
Cite this article
Fang, J., Ding, J., Zhao, P. et al. Distributed and parallel processing for real-time and dynamic spatio-temporal graph. World Wide Web 23, 905–926 (2020). https://doi.org/10.1007/s11280-019-00741-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00741-6