Mining latent patterns in geoMobile data via EPIC

Narayanan, Arvind; Verma, Saurabh; Zhang, Zhi-Li

doi:10.1007/s11280-019-00702-z

Mining latent patterns in geoMobile data via EPIC

Published: 04 July 2019

Volume 22, pages 2771–2798, (2019)
Cite this article

World Wide Web Aims and scope Submit manuscript

168 Accesses
Explore all metrics

Abstract

We coin the term geoMobile data to emphasize datasets that exhibit geo-spatial features reflective of human behaviors. We propose and develop an EPIC framework to mine latent patterns from geoMobile data and provide meaningful interpretations: we first ‘E’xtract latent features from high dimensional geoMobile datasets via Laplacian Eigenmaps and perform clustering in this latent feature space; we then use a state-of-the-art visualization technique to ‘P’roject these latent features into 2D space; and finally we obtain meaningful ‘I’nterpretations by ‘C’ulling cluster-specific significant feature-set. We illustrate that the local space contraction property of our approach is most superior than other major dimension reduction techniques. Using diverse real-world geoMobile datasets, we show the efficacy of our framework via three case studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 6

Figure 17

Data clustering: application and trends

Article 27 November 2022

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Exploratory Data Analysis

Notes

Although calls involving two neighboring towers semantically qualify to be local, our reference of a call being local is solely from the tower’s perspective where both the caller and callee of a call are associated with the same tower.
https://archive.ics.uci.edu/ml/datasets/Wine
http://yann.lecun.com/exdb/mnist/

References

Alsheikh, M.A., Niyato, D., Lin, S., p Tan, H., Han, Z.: Mobile big data analytics using deep learning and apache spark. IEEE Netw. 30(3), 22–29 (2016). https://doi.org/10.1109/MNET.2016.7474340
Article Google Scholar
Baratchi, M., Meratnia, N., Havinga, P.J.M., et al.: A hierarchical hidden semi-markov model for modeling mobility data. In: ACM Ubicomp (2014)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. NIPS (2003)
Bengio, Y., Paiement, J.F., Vincent, P., et al.: Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. NIPS (2004)
Fan, Z., Song, X., Shibasaki, R.: Cityspectrum: A non-negative tensor factorization approach. In: ACM Ubicomp (2014)
Hristova, D., Williams, M.J., Musolesi, M., et al.: Measuring urban social diversity using interconnected geo-social networks. In: ACM WWW (2016)
Ihler, A.T., Smyth, P.: Learning time-intensity profiles of human activity using non-parametric bayesian models. In: NIPS (2006)
Kling, F., Pozdnoukhov, A.: When a city tells a story: urban topic analysis. In: ACM SIGSPATIAL (2012)
Krishnamurthy, A.: High-dimensional clustering with sparse gaussian mixture models. Unpublished paper, pp. 191–192 (2011)
Lakhina, A., Crovella, M., Diot, C.: Diagnosing network-wide traffic anomalies. In: ACM SIGCOMM Computer Communication Review (2004)
Article Google Scholar
Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y.: Traffic flow prediction with big data: a deep learning approach. IEEE Trans. Intell. Transp. Syst. 16(2), 865–873 (2015)
Google Scholar
Manor, L., Perona, P.: Self-tuning spectral clustering. In: NIPS (2005)
Metro, S.: Subway construction plan. http://www.szpl.gov.cn/main/zsgg/200707090211041.shtml (2015)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2002)
Ozakin, A.N.V. II., Gray, A.: Manifold learning theory and applications. CRC Press, Boca Raton (2011)
Google Scholar
Pressley, A.: Elementary Differential Geometry. Springer, Berlin (2010)
Book Google Scholar
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: Explicit invariance during feature extraction. ICML (2011)
Robusto, C.: The cosine-haversine formula. Am. Math. Mon. 64(1), 38–40 (1957)
Article MathSciNet Google Scholar
Schiebinger, G., Wainwright, M.J., Yu, B., et al.: The geometry of kernelized spectral clustering. Ann. Stat. 43(2), 819–846 (2015)
Article MathSciNet Google Scholar
Städler, N., Mukherjee, S.: Penalized estimation in high-dimensional hidden markov models with state-specific graphical models. Ann. Appl. Stat. (2013)
van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research (JMLR) (2008)
Wallach, H.M., Mimno, D.M., McCallum, A.: Rethinking lda: Why priors matter. In: NIPS (2009)
Wang, Z., Hu, K., Xu, K., et al.: Structural analysis of network traffic matrix via relaxed principal component pursuit. Comput. Netw. (2012)
Witayangkurn, A., Horanont, T., Sekimoto, Y., et al.: Anomalous event detection on large-scale GPS data from mobile phones using hidden markov model and cloud platform. In: ACM Ubicomp (2013)
Yuan, J., Zheng, Y., Xie, X.: Discovering regions of different functions in a city using human mobility and pois. In: ACM SIGKDD (2012)
Zhang, Y., Ge, Z., Greenberg, A., Roughan, M.: Network anomography. In: ACM SIGCOMM IMC (2005)
Zhang, D., Huang, J., Li, Y., et al.: Exploring human mobility with multi-source data at extremely large metropolitan scales. In: ACM MobiCom (2014)
Zhang, F., Wilkie, D., Zheng, Y., Xie, X.: Sensing the pulse of urban refueling behavior. In: ACM Ubicomp (2013)

Download references

Acknowledgments

This research was supported in part by DoD ARO MURI Award W911NF-12-1-0385, DTRA grant HDTRA1- 14-1-0040, NSF grant CNS-1411636, CNS-1618339 and CNS-1617729.

Author information

Authors and Affiliations

Department of Computer Science, Engineering, University of Minnesota, Minneapolis, MN, USA
Arvind Narayanan, Saurabh Verma & Zhi-Li Zhang

Authors

Arvind Narayanan
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Verma
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arvind Narayanan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Arvind Narayanan and Saurabh Verma contributed equally to this work.

This article belongs to the Topical Collection: Special Issue on Social Computing and Big Data Applications

Guest Editors: Xiaoming Fu, Hong Huang, Gareth Tyson, Lu Zheng, and Gang Wang

Appendix

1.1 Proof of Proposition 1

Proof

KDE is a non-parametric way to estimate probability density function; it leverages the chosen kernel in the input space for smooth estimation. Given sub-manifold density estimates p(y_i) for data points $\mathbf {y}_{i}\in \mathbb {R}^{d} $ ∀i, we want to find a representation $\mathbf {z}_{i}\in \mathbb {R}^{p}$ ∀i, p < d, such that the new density estimates q(z_i) agrees with the original density estimates. Here K_H, K_L denote the kernel in higher, lower dimensions, h is the kernel bandwidth and N is the number of data points. KDE’s in higher and lower dimensions (assuming bandwidth h remains the same) are given by:

$$ {p}(\mathbf{y})=\frac{1}{N}\sum\limits_{j=1}^{N} \frac{1}{h^{d}} K_{H} \left( \frac{|| (\mathbf{y}-\mathbf{y}_{j} ||_{d}}{h} \right), {q}(\mathbf{z})=\frac{1}{N}\sum\limits_{j=1}^{N} \frac{1}{h^{p}} K_{L} \left( \frac{|| \mathbf{z}-\mathbf{z}_{j} ||_{p}}{h} \right) $$

$\text {such that} \int K(u)du =1$. The objective function of KL divergence loss for KDE can be computed as follows:

$$ \begin{array}{@{}rcl@{}} \mathcal{L} &=& \underset{\mathbf{z}}{min}\hspace{0.2em} KL(p||q) = \underset{\mathbf{z}}{min}\hspace{0.2em} \sum\limits_{i=1}^{N}p(\mathbf{y}_{i})\log \frac{p(\mathbf{y}_{i})}{q(\mathbf{z}_{i})} \\ &=& \underset{\mathbf{z}}{min}\hspace{0.2em} \frac{1}{Nh^{d}}\sum\limits_{i =1}^{N}{\Sigma}_{j} K_{H}(\mathbf{y}_{i},\mathbf{y}_{j}) \log \frac{{\sum}_{j} K_{H}(\mathbf{y}_{i},\mathbf{y}_{j})}{{\sum}_{j} K_{L}(\mathbf{z}_{i},\mathbf{z}_{j})} +c_{1} \end{array} $$

$$ \begin{array}{@{}rcl@{}} && \text{Using log-sum inequality, we can show that,} \\ &&\leq \frac{1}{Nh^{d}}\hspace{0.2em} \underbrace{ \underset{\mathbf{z}}{min}\hspace{0.2em} \sum\limits_{i=1}^{N} \sum\limits_{j =1}^{N} K_{H}(\mathbf{y}_{i},\mathbf{y}_{j}) \log \frac{ K_{H}(\mathbf{y}_{i},\mathbf{y}_{j})}{ K_{L}(\mathbf{z}_{i},\mathbf{z}_{j})}}_{\mathcal{J}} +c_{1} \\ && \leq c_{2} \times \mathcal{J} +c_{1} \end{array} $$

$\mathcal {J}$ is the objective function of t-SNE (with specific kernels) which upper bounds (with a multiplicative scale and an additive constant) the estimated kernel density estimation loss function. □

1.2 Proof of Proposition 2

Schiebinger et al. [19] studied normalized Laplacian embedding for i.i.d. samples generated from a finite mixture of nonparametric distribution. When the distribution overlap is small and samples are large, then with high probability they showed that the embedded samples forms a orthogonal cone data structure (OCS). Figure 18 shows that (1 − α) fraction of two clusters are accumulated in a cone form of 𝜃 angle around e₁ and e₂ orthogonal axis.

Theorem 1 (Finite-sample angular structure)

There are numbersb,b₀,b₁,b₂,δ,tsatisfying certain conditions such that the embedded dataset$\{\phi (X_{i}),Z_{i}\}^{n}_{i=1}$has(α,𝜃) − OCSwith

$$ |cos\theta|\leq \frac{b_{0} \sqrt{\varphi_{n}(\delta)}}{w_{min}^{3} t - b_{0} \sqrt{\varphi_{n}(\delta)}}, \alpha \leq \frac{b_{1}}{w_{m}in^{1.5}} \varphi_{n}(\delta)+ \psi({2t}) $$

(11)

and holds with probability at least $1-8K^{2}\exp (\frac {b_{2} n \delta ^{4}}{\delta ^{2}+S_{max}(\overline {\mathbb {P}} )+ B(\overline {\mathbb {P}} )} )$.

Proof of Proposition 2:

Our strategy is to exploit the OCS structure of the input data. Let X ∈R^N×D be the normalized data with unit norm, corresponding to p_ij,q_ij as higher, lower dimensional kernel densities and Z₁,Z₂ as normalization constant respectively. Let X^′∈R^N×d,d < D,be the normalized data obtained after LE dimension reduction and have similar corresponding variables $p_{ij}^{\prime },q_{ij}^{\prime },Z_{1}^{\prime },Z_{2}^{\prime }$. Let $\beta \in (0,\frac {\pi }{2})$ and β^′ are angles between input feature vectors < x_i,x_j > and $<\mathbf {x}^{\prime }_{i},\mathbf {x}^{\prime }_{j}>$ respectively. Constant are denoted by $c_{1}, c_{1}^{\prime }, c_{2}, c_{3}, a_{1}, a_{2} \geq 0$. Also σ and σ^′ are the kernel bandwidth of the estimated kernel densities in the X and X^′ input data respectively. Let i^th cluster has N_i samples out of K clusters. For our analysis, we will focus on this i^th cluster.

Since t-SNE preserves kernel density in lower dimensions, we will have p_ij = q_ij and $p_{ij}^{\prime }=q_{ij}^{\prime }$. Some t-SNE related expressions that we will use for the proof are as follows,

$$ \begin{array}{@{}rcl@{}} && p_{ij} =\frac{\exp{\left( - \frac{\| \mathbf{x}_{i}-\mathbf{x}_{j} \|^{2}}{2\sigma^{2}} \right)}}{{\sum}_{k\neq l}\exp{\left( - \frac{\| \mathbf{x}_{k}-\mathbf{x}_{l} \|^{2}}{2\sigma^{2}} \right)}};q_{ij} =\frac{(1+\| \mathbf{y}_{i}-\mathbf{y}_{j} \|^{2})^{-1}}{{\sum}_{k\neq l}(1+\| \mathbf{y}_{k}-\mathbf{y}_{l} \|^{2})^{-1}}\\ && Z_{1}={\sum}_{k\neq l}\exp{\left( - \frac{\| \mathbf{x}_{k}-\mathbf{x}_{l} \|^{2}}{2\sigma^{2}} \right)};Z_{2}={\sum}_{k\neq l}(1+\| \mathbf{y}_{k}-\mathbf{y}_{l} \|^{2})^{-1}\\ \end{array} $$

Similar expressions can be obtained for $p_{ij}^{\prime },q_{ij}^{\prime },Z_{1}^{\prime }$ and $Z_{2}^{\prime }$. From these equations, we can show that,

$$ \frac{(1-\cos\beta)}{{\sigma_{i}^{2}}} =\log\left( \frac{1}{p_{ij}}-2\right) \frac{1}{\log \sum\limits_{{k\neq l; k,l \neq i,j}}\exp{\left( - \frac{\| \mathbf{x}_{k}-\mathbf{x}_{l} \|^{2}}{2\sigma^{2}} \right)}} $$

(12)

$$ \| \mathbf{y}_{i}-\mathbf{y}_{j} \|^{2}=\left( \frac{1}{q_{ij}}-2\right)\frac{1}{\sum\limits_{{k\neq l; k,l \neq i,j}}(1+\| \mathbf{y}_{k}-\mathbf{y}_{l} \|^{2})^{-1}}-1 $$

(13)

$ \text {Let,} c_{1} =\sum \limits _{{k\neq l; k,l \neq i,j}}(1+\| \mathbf {y}_{k}-\mathbf {y}_{l} \|^{2})^{-1}$. Now according to Theorem 1, β^′ is bounded in $(\frac {\pi }{2}-2\theta , \frac {\pi }{2}+2\theta )$ with high probability, if (i,j) belongs to different class labels. In general for small 𝜃, we can assume that the different clusters form a separation angle (with respect to origin) such that $\frac {\pi }{2}-2\theta > \beta $ i.e β^′≥ β for all pairs of (i,j). Then according to (12), $p_{ij} \geq p_{ij}^{\prime }$ and therefore $q_{ij} \geq q_{ij}^{\prime }$, if (i,j) belongs to different class labels. Equation (13) further yields,

$$ \log \frac{c_{1}^{\prime}(1+\| \mathbf{y}^{\prime}_{i}-\mathbf{y}^{\prime}_{j} \|^{2})+2}{c_{1}(1+\| \mathbf{y}_{i}-\mathbf{y}_{j} \|^{2})+2} = \log \frac{q_{ij}}{q_{ij}^{\prime}} = \log \frac{p_{ij}}{p_{ij}^{\prime}} \geq 0 $$

This shows that Lt-SNE always provide better mapping than t-SNE for $c_{1}^{\prime } \leq c_{1}$ which is generally the case. For small 𝜃, we expect $p_{kl} > p^{\prime }_{kl}$$(\Rightarrow q_{kl} > q^{\prime }_{kl})$ for (k,l) belonging to different class and $p_{kl} \approx p^{\prime }_{kl}$$(\Rightarrow q_{kl} \approx q^{\prime }_{kl})$ for (k,l) belonging to the same class. This leads to $c_{1}^{\prime } \leq c_{1}$ since q_kl ∝ (1 + ∥y_k −y_l∥²)^− 1. Next, we establish an lower bound on this mapping ratio using this expression,

$$ \log \frac{c_{1}^{\prime}(1+\| \mathbf{y}^{\prime}_{i}-\mathbf{y}^{\prime}_{j} \|^{2})+2}{c_{1}(1+\| \mathbf{y}_{i}-\mathbf{y}_{j} \|^{2})+2} = \frac{1-\cos\beta^{\prime}}{\sigma^{\prime2}}+ \log \frac{Z_{1}^{\prime}}{Z_{1}}+ \frac{\cos \beta -1}{\sigma^{2}} $$

(14)

For fixed β, $(\frac {\cos \beta -1} {\sigma ^{2}}- \log Z_{1})$ term is constant. Now normalization constant $Z_{1}^{\prime }$ is the sum of kernel densities between samples within cluster itself and across other clusters. From Theorem 1, we know that (1 − α) fraction of a cluster belongs to a orthogonal cone structure with $\theta \in (0,\frac {\pi }{4})$ angle with high probability. Ignoring α samples (which add positive values to $Z_{1}^{\prime }$), we can provide a lower bound on $Z_{1}^{\prime }$ with the same probability bound as given in Theorem 1 for 𝜃,α.

$$ \begin{array}{@{}rcl@{}} Z^{\prime}_{1} &\geq& \underbrace{ \sum\limits_{k = 1}^{K} (1-\alpha)^{2} N_{K} (N_{k}-1) e^{-\frac{(1-\cos 2\theta)}{\sigma^{\prime2}}}}_{\text{sum of densities within clusters}}+ \underbrace{ \sum\limits_{k \neq l} (1-\alpha)^{2} N_{k} N_{l} e^{-\frac{(1+\sin 2\theta)}{\sigma^{\prime2}}}}_{\text{sum of densities across clusters}} \\ Z^{\prime}_{1} &\geq& \frac{(1-\alpha)^{2}}{e^{\frac{(1-\cos 2\theta)}{\sigma^{\prime2}}}} \left( \sum\limits_{k = 1}^{K} N_{K} (N_{k}-1) + \sum\limits_{k \neq l} (1-\alpha)^{2} N_{k} N_{l} e^{- \sqrt{2}\cos (\frac{\pi}{4} -2\theta)} \right)\\ \end{array} $$

Finally, we can plug $Z^{\prime }_{1}$ in (14) and putting $\beta ^{\prime }=\frac {\pi }{2}-\theta $ for getting lower bound, we obtain our final expressions.

$$ c_{2} =\frac{\cos \beta -1} {\sigma^{2}}- \log Z_{1} + \log\left( \sum\limits_{k = 1}^{K} N_{K} (N_{k}-1) +\sum\limits_{k \neq l} (1-\alpha)^{2} N_{k} N_{l} e^{- \sqrt{2}}\right) $$

$$ \begin{array}{@{}rcl@{}} \log \frac{c_{1}^{\prime}(1+\| \mathbf{y}^{\prime}_{i}-\mathbf{y}^{\prime}_{j} \|^{2})+2}{c_{1}(1+\| \mathbf{y}_{i}-\mathbf{y}_{j} \|^{2})+2} & \geq& \frac{\sqrt{2}\sin (\frac{\pi}{4} -2\theta)} {\sigma_{\prime}^{2}}+ 2\log(1-\alpha)+c_{2} \\ \implies \| \mathbf{y}^{\prime}_{i}-\mathbf{y}^{\prime}_{j} \|^{2} & \geq& a_{1}\| \mathbf{y}_{i}-\mathbf{y}_{j} \|^{2} +a_{2} \end{array} $$

Here, $c_{3}= \exp (\frac {\sqrt {2}\sin (\frac {\pi }{4} -2\theta )} {\sigma _{\prime }^{2}}+ 2\log (1-\alpha )+c_{2}) \geq 1$, $a_{2}=\frac {c_{3}+c_{3}c_{1}-c_{1}^{\prime }-1}{c_{1}^{\prime }}\geq 0$ and $a_{1}=\frac {c_{3}c_{1}}{c_{1}^{\prime }} \geq 1$, if $c_{1} \geq c_{1}^{\prime }$ which is the case for small 𝜃. This completes the full proof. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narayanan, A., Verma, S. & Zhang, ZL. Mining latent patterns in geoMobile data via EPIC. World Wide Web 22, 2771–2798 (2019). https://doi.org/10.1007/s11280-019-00702-z

Download citation

Received: 15 December 2017
Revised: 12 March 2019
Accepted: 02 June 2019
Published: 04 July 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11280-019-00702-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining latent patterns in geoMobile data via EPIC

Abstract

Access this article