Abstract
Location data is among the most sensitive data regarding the privacy of the observed users. To collect location data, mobile phones and other mobile devices constantly track their positions. This work examines the question whether publicly available spatio-temporal user data can be used to link newly observed location data to known user profiles. For this study, publicly available location information about Twitter users is used to construct spatio-temporal user profiles describing a user’s movement in space and time. It shows how to use these profiles to match a new location trace to their user with high accuracy. Furthermore, it shows how to link users of two different trace data sets. For this case study, 15,989 of the most prolific Twitter users in London in 2014 are considered. The experimental results show that the classification approach allows to correctly identify 98% of the most prolific 500 of these users. Furthermore, it can correctly identify more than 50% of any users by using three observations of these users, rather than their whole location trace. This alarming result shows that spatio-temporal data is highly discriminative, thus putting the privacy of hundreds of millions of geo-social network users at a risk. It further shows that it can correctly match most users of Instagram to users of Twitter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Run-time tests were performed on AWS using a m4.2xlarge EC2 instance running Amazon Linux. This instance type has 8 CPU cores and 32 GB of RAM.
References
App genome report. www.mylookout.com/resources/reports/appgenome. Accessed 12 Aug 2016
Apple privacy policy. www.apple.com/legal/privacy/. Accessed 12 Aug 2016
Apples app store downloads top 130 billion. http://www.gsmarena.com/apple_app_store_now_has_2000000_apps_50_billion_paid_to_devs-blog-18798.php. Accessed 12 Aug 2016
Google Play: Pokémon Go Download. https://play.google.com/store/apps/details?id=com.nianticlabs.pokemong. Accessed 22 Aug 2016
Aggarwal, C.C., Yu, P.S.: A condensation approach to privacy preserving data mining. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 183–199. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24741-8_12
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: AAAI 1994 Workshop on Knowledge Discovery in Databases (KDD-1994), vol. 398, pp. 359–370 (1994)
Bettini, C., Wang, X.S., Jajodia, S.: Protecting privacy against location-based personal identification. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, pp. 185–199. Springer, Heidelberg (2005). doi:10.1007/11552338_13
Blumberg, A.J., Eckersley, P.: On locational privacy, and how to avoid losing it forever. Electronic Frontier Foundation, Technical report, pp. 1–7, August 2009
Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 188–200. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71703-4_18
Cao, W., Wu, Z., Wang, D., Li, J., Wu, H.: Automatic user identification method across heterogeneous mobility data sources. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 978–989. IEEE, May 2016
de Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M., Blondel, V.D.: Unique in the Crowd: the privacy bounds of human mobility. Sci. Rep. 3, 1376 (2013)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Hashem, T., Kulik, L.: Safeguarding location privacy in wireless ad-hoc networks. In: Krumm, J., Abowd, G.D., Seneviratne, A., Strang, T. (eds.) UbiComp 2007. LNCS, vol. 4717, pp. 372–390. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74853-3_22
Hopcroft, J.E., Karp, R.M.: An \(n^{5/2} \) algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973)
Jaccard, P.: The distribution of the flora in the alphine zone. New Phytol. 11(2), 37–50 (1912)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional K-anonymity. In: 22nd International Conference on Data Engineering (ICDE 2006), vol. 2006, p. 25. IEEE (2006)
Liu, J., Zhang, F., Song, X., Song, Y.-I., Lin, C.-Y., Hon, H.-W.: What’s in a name? In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining - WSDM 2013, p. 495. ACM Press, New York (2013)
Liu, S., Wang, S., Zhu, F., Zhang, J., Krishnan, R.: HYDRA: large-scale social identity linkage via heterogeneous behavior modeling. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data - SIGMOD 2014, pp. 51–62. ACM Press, New York (2014)
Malhotra, A., Totti, L., Meira, W., Kumaraguru, P., Almeida, V.: Studying user footprints in different online social networks. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1065–1070. IEEE, August 2012
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2012)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(05), 557–570 (2002)
Zafarani, R., Liu, H.: Connecting corresponding identities across communities. In: Proceedings of the Third International Conference on Weblogs and Social Media - ICWSM 2009, pp. 354–357, November 2009
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Seglem, E., Züfle, A., Stutzki, J., Borutta, F., Faerman, E., Schubert, M. (2017). On Privacy in Spatio-Temporal Data: User Identification Using Microblog Data. In: Gertz, M., et al. Advances in Spatial and Temporal Databases. SSTD 2017. Lecture Notes in Computer Science(), vol 10411. Springer, Cham. https://doi.org/10.1007/978-3-319-64367-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-64367-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64366-3
Online ISBN: 978-3-319-64367-0
eBook Packages: Computer ScienceComputer Science (R0)