Skip to main content
Log in

Elite: an elastic infrastructure for big spatiotemporal trajectories

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

As the volumes of spatiotemporal trajectory data continue to grow at a rapid pace; a new generation of data management techniques is needed in order to be able to utilize these data to provide a range of data-driven services, including geographic-type services. Key challenges posed by spatiotemporal data include the massive data volumes, the high velocity with which the data are captured, the need for interactive response times, and the inherent inaccuracy of the data. We propose an infrastructure, Elite, that leverages peer-to-peer and parallel computing techniques to address these challenges. The infrastructure offers efficient, parallel update and query processing by organizing the data into a layered index structure that is logically centralized, but physically distributed among computing nodes. The infrastructure is elastic with respect to storage, meaning that it adapts to fluctuations in the storage volume, and with respect to computation, meaning that the degree of parallelism can be adapted to best match the computational requirements. Further, the infrastructure offers advanced functionality, including probabilistic simulations, for contending with the inaccuracy of the underlying data in query processing. Extensive empirical studies offer insight into properties of the infrastructure and indicate that it meets its design goals, thus enabling the effective management of big spatiotemporal data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28

Similar content being viewed by others

Notes

  1. The minimum capacity is below half of the maximum capacity. Otherwise, the condensed node may exceed the maximum capacity.

  2. Details on the *node are covered in Sect. 4.3.1.

  3. From experiments, we obtained \(\alpha =5e^{-4}\), \(\beta =1e^{-6}\), and \(\gamma =1e^{-6}\).

  4. http://iapg.jade-hs.de/personen/brinkhoff/generator/.

  5. http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/.

  6. http://en.wikipedia.org/wiki/DifferentialGPS.

  7. http://www.mongodb.org.

  8. https://github.com/couchbase/geocouch.

  9. There exist different semantics for top-k queries over uncertain data, such as U-TopK, U-kRanks, Expected scores, and Expected ranks. Among them, the Expected score and Expected rank might be best ones in terms of properties such as Containment and Unique-Rank [39].

References

  1. Ceikute, V., Jensen, C.S.: Vehicle routing with user-generated trajectory data. In: MDM (2015)

  2. Yang, B., Guo, C., Ma, Y., Jensen, C.S.: Toward personalized, context-aware routing. VLDB J. 24(2), 297–318 (2015)

    Article  Google Scholar 

  3. Dai, J., Yang, B., Guo, C., Jensen, C.S.: Efficient and accurate path cost estimation using trajectory data. In: CoRR abs/1510.02886 (2015)

  4. Stougiannis, A., Pavlovic, M., Tauheed, F., et al.: Data-driven neuroscience: enabling breakthroughs via innovative data management. In: SIGMOD (2013)

  5. Manyika, J., Chui, M.: Big data: the next frontier for innovation, competition, and productivity. In: McKinsey Global Institute (2011)

  6. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD (2003)

  7. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Querying imprecise data in moving object environments. TKDE 16(9), 1112–1127 (2004)

    Google Scholar 

  8. Trajcevski, G., Tamassia, R., Ding, H., et al.: Continuous probabilistic nearest-neighbor queries for uncertain trajectories. In: EDBT (2009)

  9. Civilis, A., Jensen, C.S., Pakalnis, S.: Techniques for efficient road-network-based tracking of moving objects. TKDE 17(5), 698–712 (2005)

    Google Scholar 

  10. Jensen, C.S., Pakalnis, S.: Trax - real-world tracking of moving objects. In: VLDB (2007)

  11. Eldawy, A., Li, Y., Mokbel, M.F., Janardan, R.: \(\text{ CG }\_\text{ Hadoop }\): computational geometry in mapreduce. In: GIS (2013)

  12. Eldawy, A., Mokbel, M.F.: A demonstration of SpatialHadoop: an efficient MapReduce framework for spatial data. In: VLDB (2013)

  13. Aji, A., Wang, F., Vo, H., et al.: Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. In: VLDB (2013)

  14. Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: MD-HBase: a scalable multi-dimensional data infrastructure for location aware services. In: MDM (2011)

  15. Wang, J., Wu, S., Gao, H., et al.: Indexing multi-dimensional data in a cloud system. In: SIGMOD (2010)

  16. Tsatsanifos, G., Sacharidis, D., Sellis, T.: Index-based query processing on distributed multidimensional data. GeoInformatica 17(3), 489–519 (2013)

    Article  Google Scholar 

  17. Ratnasamy, S., Francis, P., Handley, M., et al.: A scalable content-addressable network. In: SIGCOMM (2001)

  18. Wei, L.Y., Zheng, Y., Peng, W.C.: Constructing popular routes from uncertain trajectories. In: KDD (2012)

  19. Pei, T., Zhou, C., Zhu, A.-X, et al.: Windowed nearest neighbour method for mining spatio-temporal clusters in the presence of noise. In: International Journal of Geographical Information Science (2010)

  20. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB (2004)

  21. Pfoser, D., Jensen, C.S.: Capturing the uncertainty of moving-objects representations. In: SSDBM (1999)

  22. Lian, X, Chen, L.: Monochromatic and bichromatic reverse skyline search over uncertain databases. In: SIGMOD (2008)

  23. Kriegel, H.P., Kunath, P., Renz, M.: Probabilistic nearest-neighbor query on uncertain objects. In: DASFAA (2007)

  24. Pugh, W.: Concurrent maintenance of lists. Technical report, Dept. of Computer Science, University of Maryland (1990)

  25. Gargantini, I.: An effective way to represent octrees. Commun. ACM 25(12), 905–910 (1982)

    Article  MATH  Google Scholar 

  26. Sidlauskas, D., Saltenis, S., Christiansen, C.W., et al.: Trees or grids?: indexing moving objects in main memory. In: GIS (2009)

  27. Sidlauskas, D., Saltenis, S., Jensen, C.S.: Processing of extreme moving-object update and query workloads in main memory. VLDB J. 23(5), 817–841 (2014)

    Article  Google Scholar 

  28. Cheng, R., Chen, J., Mokbel, M., Chow, C.Y.: Probabilistic verifiers: evaluating constrained nearest-neighbor queries over uncertain data. In: ICDE (2008)

  29. You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In: ICDE Workshops (2015)

  30. Trajcevski, G., Wolfson, O., Zhang, F., Chamberlain, S.: The geometry of uncertainty in moving object databases. In: EDBT (2002)

  31. Zheng, K., Trajcevski, G., Zhou, X., Scheuermann, P.: Probabilistic range queries for uncertain trajectories on road networks. In: EDBT (2011)

  32. Zheng, K., Fung, G.P.C., Zhou, X.: K-nearest neighbor search for fuzzy objects. In: SIGMOD (2010)

  33. Xie, X., Yiu, M.L., Cheng, R., Lu, H.: Scalable evaluation of trajectory queries over imprecise location data. TKDE 26(8), 2029–2044 (2014)

    Google Scholar 

  34. Tao, Y., Papadias, D.: MV3R-Tree: a spatio-temporal access method for timestamp and interval queries. In: VLDB (2001)

  35. Pfoster, D., Jensen, C.S., Theodoridis, Y.: Novel approaches to the indexing of moving object trajectories. In: VLDB (2000)

  36. Chakka, V.P., Everspaugh, A.C., Patel, J.M., et al.: Indexing large trajectory data sets with SETI. In: The first biennial conference on innovative data systems research (CIDR) (2003). http://www.cidrdb.org/cidr2003/program/p15.pdf

  37. Tsatsanifos, G., Sacharidis, D., Sellis, T.: RIPPLE: a scalable framework for distributed processing of rank queries. In: EDBT (2014)

  38. The apache cassandra project. http://cassandra.apache.org/

  39. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE (2009)

  40. Born, M.: On the stability of crystal lattices. IX. Covariant theory of lattice deformations and the stability of some hexagonal lattices. In: Proceedings of the Cambridge Philosophical Society 38 (1942)

Download references

Acknowledgments

This work was supported by the 973 program with No 2012CB316205, a grant from the Obel Family Foundation, and National Science Foundation of China under Grant No. 61432006. The work was done in part when some of the authors visited SA Center for Big Data Research at Renmin University of China. The center is partially funded by the Chinese National “111” Project “Attracting International Talents in Data Engineering and Knowledge Engineering Research”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinchuan Chen.

Appendices

Appendix 1: Algorithms

figure c
figure d
figure e

Appendix 2: Routing cost estimation

Lemma 1

The maximum routing cost of a three-dimensional torus of N nodes is approximately equal to \(0.91 N^{\frac{1}{3}}\).

Proof

For any torus node n, it takes one hop to reach its six neighbors (first-order neighbors) and two hops to reach its 18 second order neighbors. The number of i-th order neighbors \(a^i\) can be represented by [40]:

$$\begin{aligned} a_i = 2+ 4i^2 \end{aligned}$$

Suppose the furthest node on the torus takes m hops from node n. The total number of nodes N visited equals the summation of the number of 1-th to m-th order neighbors. We have:

$$\begin{aligned} \sum _{i=1}^{m} a_i = N \Rightarrow \sum _{i=1}^m 2+ 4i^2 = N \Rightarrow m \approx \left( \frac{3}{4}N\right) ^{\frac{1}{3}} \end{aligned}$$

Thus, the maximum routing cost equals to the distance to n’s m-th order neighbor, \(0.91N^{\frac{1}{3}}\). \(\square \)

Lemma 2

The average routing cost of a three-dimensional torus of N nodes is approximately equal to \(0.69 N^{\frac{1}{3}}\) hops.

Proof

Based on Lemma 1, the furthest node from n requires m hops. Then, the average number of hops is:

$$\begin{aligned} { avg\_number\_of\_hops }= & {} \frac{1}{N} \sum _{i=1}^{m} a_i \cdot i = \frac{1}{N} \sum _{i=1}^{m} 2i + 4 i^3\nonumber \\= & {} \frac{1}{N} \left( m^4+\frac{8}{3}m^3+2m^2+\frac{1}{3}m\right) \nonumber \\\approx & {} 0.69 N^{\frac{1}{3}} \end{aligned}$$
(6)

\(\square \)

Appendix 3: Estimation of h

Let us consider the cost estimation on torus node \(T_i\). After the range search \(Q_i \oplus d_{{ max }} \oplus U_{{ max }}\) (Step 6 in Algorithm 2), we get a set \(C_i\) of candidate trajectories with the average length \(\overline{{\mathcal {T}}_c\cdot \varDelta t}=\frac{1}{|C_i|} \sum _{{\mathcal {T}} \in C_i} {\mathcal {T}}\cdot \varDelta t\). According to Definition 5, the cost of STNNQ depends on the number of trajectories at each snapshot. To estimate that, we first assume the trajectories are uniformly distributed in the spatiotemporal region \(Q_i \oplus d_{{ max }} \oplus U_{{ max }}\).

$$\begin{aligned} { \#~of~objects~per~snapshot } = \frac{ \overline{{\mathcal {T}}_c\cdot \varDelta t} \cdot |C_i| }{|Q_i\cdot \varDelta t|} \end{aligned}$$
(7)

We define the density \(\rho \), as the number of objects per snapshot divided by the area of the filtering bound \(\pi (d_{{ max }}+U_{{ max }})^2\).

Lemma 3

Assume a two-dimensional region S in the spatial domain \({\mathfrak {S}}\), where the points are uniformly distributed, and let \(N(S) = m\) represent the fact that there are m points inside region S. The probability of \(N(S) = m\) is given by:

$$\begin{aligned} P(N(S)=m) = \frac{\rho |S| e^{-\rho |S|m}}{m!} \end{aligned}$$
(8)

Proof

The probability that m points out of n objects are in S is:

$$\begin{aligned} P(N(S)=m)= \left( {\begin{array}{c}n\\ m\end{array}}\right) \left( \frac{|S|}{|{\mathfrak {S}}|}\right) ^m \left( 1-\frac{|S|}{|{\mathfrak {S}}|}\right) ^{n-m} \end{aligned}$$

The extreme form of the binomial distribution is a Poisson distribution. Let \(\rho = \frac{n}{|{\mathfrak {S}}|}\). The above equation becomes:

$$\begin{aligned} P(N(S)=m) = \frac{(\rho |S|)^me^{-\rho |S|}}{m!} \end{aligned}$$

Then, the probability that there is at least one point in S is:

$$\begin{aligned} P(N \ge 1)= & {} \sum _{i=1}^{\infty } P(N = i) = \sum _{i=1}^{\infty } \frac{(\rho |S|)^ie^{-\rho |S|}}{i!}\\= & {} 1 - e^{-\rho |S|} \end{aligned}$$

\(\square \)

Then, we can infer that there is a nearest neighbor within the circular region S to the query point with a probability higher than \(P^*\). In our implementation, we set \(P^*\) to 0.9, which is reasonably large for S to contain the nearest neighbor.

$$\begin{aligned} |S| = -\frac{ln\left( 1-P^*\right) }{\rho } \end{aligned}$$
(9)

The number of candidate objects per snapshot is estimated as:

$$\begin{aligned} h = \rho (|S \oplus U_{{ max }}|). \end{aligned}$$
(10)

Appendix 4: Obtaining \(d_{\mathrm{max}}\)

In our system, we try a series of range queries \(Q_i \oplus d \oplus U_{{ max }}\) to incrementally obtain \(d_{{ max }}\), where \(d=5, 10, 20\,\%, \cdots \) of torus node \(T_i\)’s spatial domain size. Upon collecting the candidate trajectory set \(C_i\) by the range search parameterized with d, we test whether the union of these trajectories’ time spans can cover \(Q_i\)’s \(\varDelta t\), i.e., to decide whether \(\cup _{UT \in C}UT\cdot \varDelta t \supseteq Q_i\cdot \varDelta t\) is true. If true, it means that there always exists at least an object for each timestamp in \(Q_i\cdot \varDelta t\). Therefore, current d is taken as \(d_{{ max }}\), which is sufficiently large for not missing any possible candidate trajectories. Otherwise, we need to increase d incrementally and repeat the aforementioned process.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, X., Mei, B., Chen, J. et al. Elite: an elastic infrastructure for big spatiotemporal trajectories. The VLDB Journal 25, 473–493 (2016). https://doi.org/10.1007/s00778-016-0425-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0425-6

Keywords

Navigation