Abstract
We coin the term geoMobile data to emphasize datasets that exhibit geo-spatial features reflective of human behaviors. We propose and develop an EPIC framework to mine latent patterns from geoMobile data and provide meaningful interpretations: we first ‘E’xtract latent features from high dimensional geoMobile datasets via Laplacian Eigenmaps and perform clustering in this latent feature space; we then use a state-of-the-art visualization technique to ‘P’roject these latent features into 2D space; and finally we obtain meaningful ‘I’nterpretations by ‘C’ulling cluster-specific significant feature-set. We illustrate that the local space contraction property of our approach is most superior than other major dimension reduction techniques. Using diverse real-world geoMobile datasets, we show the efficacy of our framework via three case studies.
Similar content being viewed by others
Notes
Although calls involving two neighboring towers semantically qualify to be local, our reference of a call being local is solely from the tower’s perspective where both the caller and callee of a call are associated with the same tower.
References
Alsheikh, M.A., Niyato, D., Lin, S., p Tan, H., Han, Z.: Mobile big data analytics using deep learning and apache spark. IEEE Netw. 30(3), 22–29 (2016). https://doi.org/10.1109/MNET.2016.7474340
Baratchi, M., Meratnia, N., Havinga, P.J.M., et al.: A hierarchical hidden semi-markov model for modeling mobility data. In: ACM Ubicomp (2014)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. NIPS (2003)
Bengio, Y., Paiement, J.F., Vincent, P., et al.: Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. NIPS (2004)
Fan, Z., Song, X., Shibasaki, R.: Cityspectrum: A non-negative tensor factorization approach. In: ACM Ubicomp (2014)
Hristova, D., Williams, M.J., Musolesi, M., et al.: Measuring urban social diversity using interconnected geo-social networks. In: ACM WWW (2016)
Ihler, A.T., Smyth, P.: Learning time-intensity profiles of human activity using non-parametric bayesian models. In: NIPS (2006)
Kling, F., Pozdnoukhov, A.: When a city tells a story: urban topic analysis. In: ACM SIGSPATIAL (2012)
Krishnamurthy, A.: High-dimensional clustering with sparse gaussian mixture models. Unpublished paper, pp. 191–192 (2011)
Lakhina, A., Crovella, M., Diot, C.: Diagnosing network-wide traffic anomalies. In: ACM SIGCOMM Computer Communication Review (2004)
Lv, Y., Duan, Y., Kang, W., Li, Z., Wang, F.Y.: Traffic flow prediction with big data: a deep learning approach. IEEE Trans. Intell. Transp. Syst. 16(2), 865–873 (2015)
Manor, L., Perona, P.: Self-tuning spectral clustering. In: NIPS (2005)
Metro, S.: Subway construction plan. http://www.szpl.gov.cn/main/zsgg/200707090211041.shtml (2015)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2002)
Ozakin, A.N.V. II., Gray, A.: Manifold learning theory and applications. CRC Press, Boca Raton (2011)
Pressley, A.: Elementary Differential Geometry. Springer, Berlin (2010)
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: Explicit invariance during feature extraction. ICML (2011)
Robusto, C.: The cosine-haversine formula. Am. Math. Mon. 64(1), 38–40 (1957)
Schiebinger, G., Wainwright, M.J., Yu, B., et al.: The geometry of kernelized spectral clustering. Ann. Stat. 43(2), 819–846 (2015)
Städler, N., Mukherjee, S.: Penalized estimation in high-dimensional hidden markov models with state-specific graphical models. Ann. Appl. Stat. (2013)
van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research (JMLR) (2008)
Wallach, H.M., Mimno, D.M., McCallum, A.: Rethinking lda: Why priors matter. In: NIPS (2009)
Wang, Z., Hu, K., Xu, K., et al.: Structural analysis of network traffic matrix via relaxed principal component pursuit. Comput. Netw. (2012)
Witayangkurn, A., Horanont, T., Sekimoto, Y., et al.: Anomalous event detection on large-scale GPS data from mobile phones using hidden markov model and cloud platform. In: ACM Ubicomp (2013)
Yuan, J., Zheng, Y., Xie, X.: Discovering regions of different functions in a city using human mobility and pois. In: ACM SIGKDD (2012)
Zhang, Y., Ge, Z., Greenberg, A., Roughan, M.: Network anomography. In: ACM SIGCOMM IMC (2005)
Zhang, D., Huang, J., Li, Y., et al.: Exploring human mobility with multi-source data at extremely large metropolitan scales. In: ACM MobiCom (2014)
Zhang, F., Wilkie, D., Zheng, Y., Xie, X.: Sensing the pulse of urban refueling behavior. In: ACM Ubicomp (2013)
Acknowledgments
This research was supported in part by DoD ARO MURI Award W911NF-12-1-0385, DTRA grant HDTRA1- 14-1-0040, NSF grant CNS-1411636, CNS-1618339 and CNS-1617729.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Arvind Narayanan and Saurabh Verma contributed equally to this work.
This article belongs to the Topical Collection: Special Issue on Social Computing and Big Data Applications
Guest Editors: Xiaoming Fu, Hong Huang, Gareth Tyson, Lu Zheng, and Gang Wang
Appendix
Appendix
1.1 Proof of Proposition 1
Proof
KDE is a non-parametric way to estimate probability density function; it leverages the chosen kernel in the input space for smooth estimation. Given sub-manifold density estimates p(yi) for data points \(\mathbf {y}_{i}\in \mathbb {R}^{d} \) ∀i, we want to find a representation \(\mathbf {z}_{i}\in \mathbb {R}^{p}\) ∀i, p < d, such that the new density estimates q(zi) agrees with the original density estimates. Here KH, KL denote the kernel in higher, lower dimensions, h is the kernel bandwidth and N is the number of data points. KDE’s in higher and lower dimensions (assuming bandwidth h remains the same) are given by:
\(\text {such that} \int K(u)du =1\). The objective function of KL divergence loss for KDE can be computed as follows:
\(\mathcal {J}\) is the objective function of t-SNE (with specific kernels) which upper bounds (with a multiplicative scale and an additive constant) the estimated kernel density estimation loss function. □
1.2 Proof of Proposition 2
Schiebinger et al. [19] studied normalized Laplacian embedding for i.i.d. samples generated from a finite mixture of nonparametric distribution. When the distribution overlap is small and samples are large, then with high probability they showed that the embedded samples forms a orthogonal cone data structure (OCS). Figure 18 shows that (1 − α) fraction of two clusters are accumulated in a cone form of 𝜃 angle around e1 and e2 orthogonal axis.
Theorem 1 (Finite-sample angular structure)
There are numbersb,b0,b1,b2,δ,tsatisfying certain conditions such that the embedded dataset\(\{\phi (X_{i}),Z_{i}\}^{n}_{i=1}\)has(α,𝜃) − OCSwith
and holds with probability at least \(1-8K^{2}\exp (\frac {b_{2} n \delta ^{4}}{\delta ^{2}+S_{max}(\overline {\mathbb {P}} )+ B(\overline {\mathbb {P}} )} )\).
Proof of Proposition 2:
Our strategy is to exploit the OCS structure of the input data. Let X ∈RN×D be the normalized data with unit norm, corresponding to pij,qij as higher, lower dimensional kernel densities and Z1,Z2 as normalization constant respectively. Let X′∈RN×d,d < D,be the normalized data obtained after LE dimension reduction and have similar corresponding variables \(p_{ij}^{\prime },q_{ij}^{\prime },Z_{1}^{\prime },Z_{2}^{\prime }\). Let \(\beta \in (0,\frac {\pi }{2})\) and β′ are angles between input feature vectors < xi,xj > and \(<\mathbf {x}^{\prime }_{i},\mathbf {x}^{\prime }_{j}>\) respectively. Constant are denoted by \(c_{1}, c_{1}^{\prime }, c_{2}, c_{3}, a_{1}, a_{2} \geq 0\). Also σ and σ′ are the kernel bandwidth of the estimated kernel densities in the X and X′ input data respectively. Let ith cluster has Ni samples out of K clusters. For our analysis, we will focus on this ith cluster.
Since t-SNE preserves kernel density in lower dimensions, we will have pij = qij and \(p_{ij}^{\prime }=q_{ij}^{\prime }\). Some t-SNE related expressions that we will use for the proof are as follows,
Similar expressions can be obtained for \(p_{ij}^{\prime },q_{ij}^{\prime },Z_{1}^{\prime }\) and \(Z_{2}^{\prime }\). From these equations, we can show that,
\( \text {Let,} c_{1} =\sum \limits _{{k\neq l; k,l \neq i,j}}(1+\| \mathbf {y}_{k}-\mathbf {y}_{l} \|^{2})^{-1}\). Now according to Theorem 1, β′ is bounded in \((\frac {\pi }{2}-2\theta , \frac {\pi }{2}+2\theta )\) with high probability, if (i,j) belongs to different class labels. In general for small 𝜃, we can assume that the different clusters form a separation angle (with respect to origin) such that \(\frac {\pi }{2}-2\theta > \beta \) i.e β′≥ β for all pairs of (i,j). Then according to (12), \(p_{ij} \geq p_{ij}^{\prime }\) and therefore \(q_{ij} \geq q_{ij}^{\prime }\), if (i,j) belongs to different class labels. Equation (13) further yields,
This shows that Lt-SNE always provide better mapping than t-SNE for \(c_{1}^{\prime } \leq c_{1}\) which is generally the case. For small 𝜃, we expect \(p_{kl} > p^{\prime }_{kl}\)\((\Rightarrow q_{kl} > q^{\prime }_{kl})\) for (k,l) belonging to different class and \(p_{kl} \approx p^{\prime }_{kl}\)\((\Rightarrow q_{kl} \approx q^{\prime }_{kl})\) for (k,l) belonging to the same class. This leads to \(c_{1}^{\prime } \leq c_{1}\) since qkl ∝ (1 + ∥yk −yl∥2)− 1. Next, we establish an lower bound on this mapping ratio using this expression,
For fixed β, \((\frac {\cos \beta -1} {\sigma ^{2}}- \log Z_{1})\) term is constant. Now normalization constant \(Z_{1}^{\prime }\) is the sum of kernel densities between samples within cluster itself and across other clusters. From Theorem 1, we know that (1 − α) fraction of a cluster belongs to a orthogonal cone structure with \(\theta \in (0,\frac {\pi }{4})\) angle with high probability. Ignoring α samples (which add positive values to \(Z_{1}^{\prime }\)), we can provide a lower bound on \(Z_{1}^{\prime }\) with the same probability bound as given in Theorem 1 for 𝜃,α.
Finally, we can plug \(Z^{\prime }_{1}\) in (14) and putting \(\beta ^{\prime }=\frac {\pi }{2}-\theta \) for getting lower bound, we obtain our final expressions.
Here, \(c_{3}= \exp (\frac {\sqrt {2}\sin (\frac {\pi }{4} -2\theta )} {\sigma _{\prime }^{2}}+ 2\log (1-\alpha )+c_{2}) \geq 1\), \(a_{2}=\frac {c_{3}+c_{3}c_{1}-c_{1}^{\prime }-1}{c_{1}^{\prime }}\geq 0\) and \(a_{1}=\frac {c_{3}c_{1}}{c_{1}^{\prime }} \geq 1\), if \(c_{1} \geq c_{1}^{\prime }\) which is the case for small 𝜃. This completes the full proof. □
Rights and permissions
About this article
Cite this article
Narayanan, A., Verma, S. & Zhang, ZL. Mining latent patterns in geoMobile data via EPIC. World Wide Web 22, 2771–2798 (2019). https://doi.org/10.1007/s11280-019-00702-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-019-00702-z