Skip to main content
Log in

How do you visit: Identifying addicts from large-scale transit records via scenario deep embedding

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Identification of individuals based on transit modes is of great importance in user tracking systems. However, identifying users in real-life studies is not trivial owing to the following challenges: 1) activity data containing both temporal and spatial context are high-order and sparse; 2) traditional two-step classifiers depend on trajectory patterns as input features, which limits accuracy especially in the case of scattered and diverse data; 3) in some cases, there are few positive instances and they are difficult to detect. Therefore, approaches involving statistics-based or trajectory-based features do not work effectively. Deep learning methods also suffer from the problem of how to represent trajectory vectors for user classification. Here, we propose a novel end-to-end scenario-based deep learning method to address these challenges, based on the observation that individuals may visit the same place for different reasons. We first define a scenario using critical places and related trajectories. Next, we embed scenarios via path-based or graph-based approaches using extended embedding techniques. Finally, a two-level convolution neural network is constructed for the classification. Our model is applied to the problem of detection of addicts using transit records directly without feature engineering, based on real-life data collected from mobile devices. Based on constructed scenario with dense trajectories, our model outperforms classical classification approaches, anomaly detection methods, state-of-the-art sequential deep learning models, and graph neural networks. Moreover, we provide statistical analyses and intuitiveexplanations to interpret the characteristics of resident and addict mobility. Our method could be generalized to other trajectory-related tasks involving scattered and diverse data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.fs.fed.us/pnw/starkey/data/tables/

  2. https://github.com/jincanghong/traj_cls/tree/master

  3. https://github.com/scikit-learn-contrib/imbalanced-learn

References

  1. World drug report (2019) http://www.unodc.org/doc/wdr2018/WDR_2018_Press_ReleaseENG.PDF, Accessed 1 Feb 2019

  2. Abul O, Bonchi F, Nanni M (2010) Anonymization of moving objects databases by clustering and perturbation. Inf Syst 35(8):884–910

    Article  Google Scholar 

  3. Branco P, Torgo L, Ribeiro R (2015) A survey of predictive modelling under imbalanced distributions. arXiv:1505.01658

  4. Cao H, Mamoulis N, Cheung DW (2007) Discovery of periodic patterns in spatiotemporal sequences. IEEE Trans Knowl Data Eng 19(4):453–467

    Article  Google Scholar 

  5. Cao H, Mamoulis N, Cheung DW (2005) Mining frequent spatio-temporal sequential patterns. In: Fifth IEEE international conference on data mining (ICDM’05)

  6. Chen C, Zhang D, Zhou Z, Li N, Atmaca T, Li S (2013) B-planner: Night bus route planning using large-scale taxi gps traces. In: 2013 IEEE international conference on pervasive computing and communications (PerCom), pp 225–233

  7. Du B, Liu C, Zhou W, Hou Z, Xiong H (2018) Detecting pickpocket suspects from large-scale public transit records. IEEE Trans Knowl Data Eng :1–1

  8. Feng J, Li Y, Zhang C, Sun F, Meng F, Guo A, Jin D (2018) Deepmove: Predicting human mobility with attentional recurrent networks. In: WWW ’18 international world wide web conferences steering committee, pp 1459–1468

  9. Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets. Springer, New York

    Book  Google Scholar 

  10. Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: KDD ’07. ACM, pp 330–339

  11. Goldberg Y, Levy O (2014) Word2vec explained: deriving mikolov others.’s negative-sampling word-embedding method. arXiv:1402.3722

  12. Gong H, Chen C, Bialostozky E, Lawson CT (2012) A gps/gis method for travel mode detection in New York city. Comput Environ Urban Syst 36 (2):131–139. special Issue: Geoinformatics 2010

    Article  Google Scholar 

  13. Grover A, Leskovec J (2016) Node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16, Association for Computing Machinery. https://doi.org/10.1145/2939672.2939754, New York, pp 855–864

  14. Guangyu Z, Gao K (2015) Research on community division algorithm with directed and weighted network in pervasive sensing environment. In: (SKG’15), pp 105–111

  15. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advances in neural information processing systems, pp 1024–1034

  16. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  17. Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. Association for Computational Linguistics, pp 427–431

  18. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882

  19. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks

  20. Kong D, Wu F (2018) Hst-lstm: A hierarchical spatial-temporal long-short term memory network for location prediction. In: IJCAI’18. AAAI Press, pp 2341–2347

  21. Laube P, Imfeld S (2002) Analyzing relative motion within groups oftrackable moving point objects. In: Egenhofer MJ, Mark DM (eds) Science, geographic information. Springer, Berlin, pp 132–144

  22. Lee JG, Han J, Li X, Gonzalez H (2008) Traclass: trajectory classification using hierarchical region-based and trajectory-based clustering. PVLDB 1(1):1081–1094

    Google Scholar 

  23. Li M, Ahmed A, Smola AJ (2015) Inferring movement trajectories from gps snippets. In: WSDM ’15. ACM, pp 325–334

  24. Li Q, Zheng Y, Xie X, Chen Y, Liu W, Ma WY (2008) Mining user similarity based on location history. In: GIS ’08. ACM, pp 34:1–34:10

  25. Lin M, Hsu WJ (2014) Mining gps data for mobility patterns: A survey. Pervasive Mobile Comput 12:1–16

    Article  Google Scholar 

  26. Luo W, Tan H, Chen L, Ni LM (2013) Finding time period-based most frequent path in big trajectory data. In: SIGMOD ’13. ACM, pp 713–724

  27. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  28. Morency C, Trepanier M, Agard B (2006) Analysing the variability of transit users behaviour with smart card data. In: 2006 IEEE intelligent transportation systems conference, pp 44–49

  29. Przulj N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):e177–e183

    Article  Google Scholar 

  30. Reddy S, Mun M, Burke J, Estrin D, Hansen M, Srivastava M (2010) Using mobile phones to determine transportation modes. ACM Trans Sen Netw 6(2):13:1–13:27

    Article  Google Scholar 

  31. da Silva TLC, de Macêdo JAF, Casanova MA (2014) Discovering frequent mobility patterns on moving object data. In: MobiGIS’14. ACM, pp 60–67

  32. Song C, Qu Z, Blumm N, Barabási AL (2010) Limits of predictability in human mobility. Science 327(5968):1018–1021

    Article  Google Scholar 

  33. Song R, Sun W, Zheng B, Zheng Y (2014) Press: A novel framework of trajectory compression in road networks. Proc VLDB Endow 7(9):661–672

    Article  Google Scholar 

  34. Van Brummelen G (2013) Heavenly mathematics: The forgotten art of spherical trigonometry. Princeton University Press, Princeton

    Google Scholar 

  35. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762

  36. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2017) Graph attention networks

  37. Wang Y, Jiang WW, Zhang D (2017) A study on drug-taking behavior based on big data: Taking guizhou province as an example. Jouranl of Shandong police Colldege

  38. Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: KDD ’15. ACM, pp 1365–1374

  39. Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1365–1374

  40. Yuan J, Zheng Y, Xie X (2012) Discovering regions of different functions in a city using human mobility and pois. In: KDD ’12. ACM, pp 186–194

  41. Zhang C, Zhang K, Yuan Q, Zhang L, Hanratty T, Han J (2016) Gmove: Group-level mobility modeling using geo-tagged social media. In: KDD ’16. ACM, pp 1305–1314

  42. Zhang J, Zheng Y, Qi D (2016) Deep spatio-temporal residual networks for citywide crowd flows prediction. arXiv:1610.00081

  43. Zheng Y (2015) Trajectory data mining: An overview. ACM Trans Intell Syst Technol

  44. Zheng Y, Chen Y, Li Q, Xie X, Ma WY (2010) Understanding transportation modes based on gps data for web applications. ACM Trans Web 4(1):1:1–1:36

    Article  Google Scholar 

  45. Zheng Y, Li Q, Chen Y, Xie X, Ma WY (2008) Understanding mobility based on gps data. In: UbiComp ’08. ACM, pp 312–321

  46. Zheng Y, Liu L, Wang L, Xie X (2008) Learning transportation mode from raw gps data for geographic applications on the web. In: WWW ’08. ACM, pp 247–256

  47. Zhonghua (2005) Alongitudinal survey of patterns and prevalence on addictive drug use in general population in five or six areas with high-prevalence in China from 1993 to 2000 Chinese. J Drug Depend

Download references

Acknowledgements

Our research is supported by the Natural Science Foundation of Zhejiang Province of China under Grant (No. LY21F020003), Zhejiang Provincial Key Research and Development Program of China (NO. 2021C01164).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minghui Wu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, C., Chen, D., Lin, Z. et al. How do you visit: Identifying addicts from large-scale transit records via scenario deep embedding. Geoinformatica 25, 799–820 (2021). https://doi.org/10.1007/s10707-021-00448-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-021-00448-9

Keywords

Navigation