Skip to main content

On Privacy in Spatio-Temporal Data: User Identification Using Microblog Data

  • Conference paper
  • First Online:
Advances in Spatial and Temporal Databases (SSTD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10411))

Included in the following conference series:

Abstract

Location data is among the most sensitive data regarding the privacy of the observed users. To collect location data, mobile phones and other mobile devices constantly track their positions. This work examines the question whether publicly available spatio-temporal user data can be used to link newly observed location data to known user profiles. For this study, publicly available location information about Twitter users is used to construct spatio-temporal user profiles describing a user’s movement in space and time. It shows how to use these profiles to match a new location trace to their user with high accuracy. Furthermore, it shows how to link users of two different trace data sets. For this case study, 15,989 of the most prolific Twitter users in London in 2014 are considered. The experimental results show that the classification approach allows to correctly identify 98% of the most prolific 500 of these users. Furthermore, it can correctly identify more than 50% of any users by using three observations of these users, rather than their whole location trace. This alarming result shows that spatio-temporal data is highly discriminative, thus putting the privacy of hundreds of millions of geo-social network users at a risk. It further shows that it can correctly match most users of Instagram to users of Twitter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Run-time tests were performed on AWS using a m4.2xlarge EC2 instance running Amazon Linux. This instance type has 8 CPU cores and 32 GB of RAM.

References

  1. App genome report. www.mylookout.com/resources/reports/appgenome. Accessed 12 Aug 2016

  2. Apple privacy policy. www.apple.com/legal/privacy/. Accessed 12 Aug 2016

  3. Apples app store downloads top 130 billion. http://www.gsmarena.com/apple_app_store_now_has_2000000_apps_50_billion_paid_to_devs-blog-18798.php. Accessed 12 Aug 2016

  4. Google Play: Pokémon Go Download. https://play.google.com/store/apps/details?id=com.nianticlabs.pokemong. Accessed 22 Aug 2016

  5. Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 183–199. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24741-8_12

    Chapter  Google Scholar 

  6. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: AAAI 1994 Workshop on Knowledge Discovery in Databases (KDD-1994), vol. 398, pp. 359–370 (1994)

    Google Scholar 

  7. Bettini, C., Wang, X.S., Jajodia, S.: Protecting privacy against location-based personal identification. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, pp. 185–199. Springer, Heidelberg (2005). doi:10.1007/11552338_13

    Chapter  Google Scholar 

  8. Blumberg, A.J., Eckersley, P.: On locational privacy, and how to avoid losing it forever. Electronic Frontier Foundation, Technical report, pp. 1–7, August 2009

    Google Scholar 

  9. Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 188–200. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71703-4_18

    Chapter  Google Scholar 

  10. Cao, W., Wu, Z., Wang, D., Li, J., Wu, H.: Automatic user identification method across heterogeneous mobility data sources. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 978–989. IEEE, May 2016

    Google Scholar 

  11. de Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the Crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)

    Article  Google Scholar 

  12. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  13. Hashem, T., Kulik, L.: Safeguarding location privacy in wireless ad-hoc networks. In: Krumm, J., Abowd, G.D., Seneviratne, A., Strang, T. (eds.) UbiComp 2007. LNCS, vol. 4717, pp. 372–390. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74853-3_22

    Chapter  Google Scholar 

  14. Hopcroft, J.E., Karp, R.M.: An \(n^{5/2} \) algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  15. Jaccard, P.: The distribution of the flora in the alphine zone. New Phytol. 11(2), 37–50 (1912)

    Article  Google Scholar 

  16. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional K-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006), vol. 2006, p. 25. IEEE (2006)

    Google Scholar 

  17. Liu, J., Zhang, F., Song, X., Song, Y.-I., Lin, C.-Y., Hon, H.-W.: What’s in a name? In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining - WSDM 2013, p. 495. ACM Press, New York (2013)

    Google Scholar 

  18. Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: HYDRA: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data - SIGMOD 2014, pp. 51–62. ACM Press, New York (2014)

    Google Scholar 

  19. Malhotra, A., Totti, L., Meira, W., Kumaraguru, P., Almeida, V.: Studying user footprints in different online social networks. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1065–1070. IEEE, August 2012

    Google Scholar 

  20. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2012)

    MathSciNet  MATH  Google Scholar 

  21. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  22. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  23. Zafarani, R., Liu, H.: Connecting corresponding identities across communities. In: Proceedings of the Third International Conference on Weblogs and Social Media - ICWSM 2009, pp. 354–357, November 2009

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Züfle .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Seglem, E., Züfle, A., Stutzki, J., Borutta, F., Faerman, E., Schubert, M. (2017). On Privacy in Spatio-Temporal Data: User Identification Using Microblog Data. In: Gertz, M., et al. Advances in Spatial and Temporal Databases. SSTD 2017. Lecture Notes in Computer Science(), vol 10411. Springer, Cham. https://doi.org/10.1007/978-3-319-64367-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64367-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64366-3

  • Online ISBN: 978-3-319-64367-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics