Abstract
The temporal data is ubiquitous, and massive amount of temporal data is generated nowadays. Management of big temporal data is important yet challenging. Processing big temporal data using a distributed system is a desired choice. However, existing distributed systems/methods either cannot support native queries, or are disk-based solutions, which could not well satisfy the requirements of high throughput and low latency. To alleviate this issue, this paper proposes an In-memory based Two-level Index Solution in Spark (ITISS) for processing big temporal data. The framework of our system is easy to understand and implement, but without loss of efficiency. We conduct extensive experiments to verify the performance of our solution. Experimental results based on both real and synthetic datasets consistently demonstrate that our solution is efficient and competitive.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Postgres 9.2 highlight - range types. http://paquier.xyz/postgresql-2/postgres-9-2-highlight-range-types
Temporal Tables. https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables
Workspace Manager Valid Time Support. https://docs.oracle.com/cd/B28359_01/appdev.111/b28396/long_vt.htm#g1014747
Ahn, I., Snodgrass, R.: Performance evaluation of a temporal database management system. In: SIGMOD (1986)
Becker, B., Gschwind, S., Ohler, T., Seeger, B., Widmayer, B.: An asymptotically optimal multiversion B-tree. VLDBJ (1996)
Bettini, C., Wang, X.S., Bertino, E., Jajodia, S.: Semantic assumptions and query evaluation in temporal databases. In: SIGMOD (1995)
Bliujute, R., Jensen, C.S., Saltenis, S., Slivinskas, G.: R-tree based indexing of now-relative bitemporal data. In: VLDB (1998)
Böhlen, M., Gamper, J., Jensen, C.S.: Multi-dimensional aggregation for temporal data. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Boehm, K., Kemper, A., Grust, T., Boehm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 257–275. Springer, Heidelberg (2006). https://doi.org/10.1007/11687238_18
Chandramouli, B., Goldstein, J., Duan, S.: Temporal analytics on big data for web advertising. In: ICDE (2012)
Cheng, K.: On computing temporal aggregates over null time intervals. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10439, pp. 67–79. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64471-4_7
Elmasri, R., Wuu, G.T., Kim, Y.J.: The time index: an access structure for temporal data. In: VLDB (1990)
Färber, F., et al.: The SAP HANA database-an architecture overview. IEEE Data Eng. Bull. (2012)
Gao, D., Jensen, S., Snodgrass, R.T., Soo, D.: Join operations in temporal databases. VLDBJ (2005)
Gendrano, J.A.G., Huang, B.C., Rodrigue, J.M., Moon, B., Snodgrass, R.T., Parallel algorithms for computing temporal aggregates. In: ICDE (1999)
Gollapudi, S., Sivakumar, D.: Framework and algorithms for trend analysis in massive temporal data sets. In: CIKM (2004)
Günnemann, S., Kremer, H., Laufkötter, C., Seidl, T.: Tracing evolving subspace clusters in temporal climate data. DMKD 24, 387–410 (2012)
Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. TKDE (2014)
Jensen, C.S., Snodgrass, R.T.: Temporal data management. TKDE (1999)
Kaufmann, M., Fischer, P.M., May, N., Ge, C., Goel, A.K., Kossmann, D.: Bi-temporal timeline index: a data structure for processing queries on bi-temporal data. In: ICDE (2015)
Kaufmann, M., Manjili, A.A., Vagenas, P., Fischer, P.M., Kossmann, D., Färber, F., May, N.: Timeline index: a unified data structure for processing queries on temporal data in SAP HANA. In: SIGMOD (2013)
Kline, N., Snodgrass, R.T.: Computing temporal aggregates. In: ICDE (1995)
Kollios, G., Tsotras, V.J.: Hashing methods for temporal data. TKDE (2002)
Le, W., Li, F., Tao, Y., Christensen, R.: Optimal splitters for temporal and multi-version databases. In: SIGMOD (2013)
Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection (2014). http://snap.stanford.edu/data
Leung, T.C., Muntz, R.R.: Temporal query processing and optimization in multiprocessor database machines. In: VLDB (1992)
Li, F., Yi, K., Le, W.: Top-k queries on temporal data. VLDBJ (2010)
Loglisci, C., Ceci, M., Malerba, D.: A temporal data mining framework for analyzing longitudinal data. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6861, pp. 97–106. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23091-2_9
Lomet, D., et al.: Transaction time support inside a database engine. In: ICDE (2006)
Ramaswamy, S.: Efficient indexing for constraint and temporal databases. In: Afrati, F., Kolaitis, P. (eds.) ICDT 1997. LNCS, vol. 1186, pp. 419–431. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-62222-5_61
Roddick, J.F., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. TKDE (2002)
Saracco, C.M., et al.: A matter of time: temporal data management in DB2 10. Technical report, IBM (2012)
Wang, P., Zhang, P., Zhou, C., Li, Z., Yang, H.: Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data. DMKD 31, 32–64 (2017)
Wang, X.S., Jajodia, S., Subrahmanian, V.: Temporal modules: an approach toward federated temporal databases. In: SIGMOD (1993)
Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In: SIGMOD (2016)
Yang, J., Widom, J.: Incremental computation and maintenance of temporal aggregates. In: ICDE (2001)
Yang, Y., Chen, K.: Temporal data clustering via weighted clustering ensemble with different representations. TKDE (2011)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI (2012)
Zhang, D., Markowetz, A., Tsotras, V.J., Gunopulos, D., Seeger, B.: On computing temporal aggregates with range predicates. TODS (2008)
Zhang, S., Yang, Y., Fan, W., Lan, L., Yuan, M.: OceanRT: real-time analytics over large temporal data. In: SIGMOD (2014)
Acknowledgments
This work was supported by the National Basic Research Program (973 Program, No. 2015CB352403), the NSFC (U1636210, 61729202, 91438121, 61672351, 61472453, U1401256, U1501252, U1611264, U1711261 and U1711262), the National Key Research and Development Program of China (2016YFB0700502), the Scientific Innovation Act of STCSM (15JC1402400), the Opening Projects of Guangdong Key Laboratory of Big Data Analysis and Processing (201808), Guangdong Province Key Laboratory of Popular High Performance Computers of Shenzhen University (SZU-GDPHPCL2017), and the Microsoft Research Asia.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Yao, B. et al. (2018). Distributed In-Memory Analytics for Big Temporal Data. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-91452-7_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)